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SIMULTANEOUS CONFIDENCE INTERVAL ESTIMATION! 
By S. N. Roy ann R. C. Bose 


University of North Carolina 


Summary. The work of Neyman on confidence limits and of Fisher on fiducial 
limits is well known. However, in most applications the interval or limits for 
only a single parameter or a single function of the parameters has been con- 
sidered. Recently Scheffé [2] and Tukey [3] have considered special cases of 
what may be called problems of simultaneous estimation, in which one is in- 
terested in giving confidence intervals for a finite or infinite set of parametric 
functions such that the probability of the parametric functions of the set being 
simultaneously covered by the corresponding intervals is a preassigned num- 
ber 1 — a(0 < a < 1), 

In this paper we discuss in Section 1, a set of sufficient conditions under 
which such simultaneous estimation is possible, and bring out the connection 
of this with a method of test construction considered by one of the authors in 
a previous paper [1]. 

In Section 2 some univariate examples (including the ones due to Scheffé and 
Tukey) are considered from this point of view. Sections 3 to 6 are concerned 
with multivariate applications, giving results which are believed to be new. The 
associated tests all turn out to be the same as in [1] except for the example in 
Section 4.3 which, in a sense, is a multivariate generalization of Tukey’s exam- 
ple (Section 2.2). Section 3 gives the notation and preliminaries for multivariate 
applications. Section 4 gives confidence bounds on linear functions of means 
for multivariate normal populations. Sections 5 and 6 give respectively confi- 
dence bounds on certain functions of the elements of population covariance 
matrices and population canonical regressions, from which a chain of simpler 
consequences would follow by the application of a set of matrix theorems. This 
has been partly indicated in the present paper and will be more fully discussed 
in a later paper. 


1. Introductory remarks on simultaneous estimation. 
1.1. Let y = (j1, Ye, °** , Yn) be an observed set of random variables, whose 
joint distribution depends on the set of unknown parameters, 


6 = (0, , A oes 6,,) 


=) ’ mi 


Let 
(1.1.1) Tl, = f,(0) 


be a set of functions of the parameters, where the index *& belongs to a finite or 
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infinite set 2. We shall consider the problem of making simultaneous confidence 
statements 

(1.32) girly) < II, s dx2(7) 


with confidence coefficient 1 — a, which gives the probability that the state- 
ments (1.1.2) are simultaneously true for all k e Q. 

This problem can in particular be solved under the following circumstances. 
Suppose it is possible to find a set of functions 
(}.3:3) vi(y, Tk), keg 
such that 
(1.1.4) dev S ds, keQ 


< 


implies (1.1.2) and conversely, where d; and d2 are constants independent of k. 
For a given @, let 


(1.1.5) W, ,= fyidy — VE < ds 6}, 


be the set of those points 1 in the sample space F,, for which (1.1.4) is satisfied. 
Let 


(1.1.6 W, = MLW 


be the intersection of the sets (1.1.5). If Ws is a Borel set for each admissible 6 
and 


’ 


1-3-7) Pr {ye W,| 0} = 1 — a, 0 <a <i 


is independent of the parameters, then 1 — a is also the chance that the state- 
ments (1.1.2) are simultaneously true for all k ¢ Q. 

Proor. If the sample point y belongs to Ws», then (1.1.4) is true for all k e Q, 
and the same holds for (1.1.2). Conversely if (1.1.2) is true for all k ¢ Q, then 
the same holds for (1.1.4). Consequently the sample point y belongs to IW,. 
Thus the statements (1.1.2) are simultaneously true when and only when y ¢ Ws, 
and the chance for this is by hypothesis | — a. 

{EMARK. We note that IV, is the set of points y which satisfy both the in- 
equalities, 


(2.33) d, S inf, yx(y, W,.) and sup, ¥(y, Tk) S do, 


and if supremum and infimum over & can be simply expressed, IW, is simply 


defined. The choice of ¥,(y, T,) in (1.1.3) can be made in very many ways and 


there is of course a set of simultaneous confidence intervals corresponding to 
each choice. In all the examples considered in this paper we have used a uni- 
form principle of choice discussed in [1], which, in the present context, can be 
indicated as follows. In trying to construct a set of confidence bounds (with a 
joint confidence coefficient, say 1 — a) for an (infinite) set of parametric func- 
tions, consider, to begin with, each such parametric function, separately, and 
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With it associate the customary confidence interval with a confidence coefficient, 
say | — 8(>1 — a). In all the examples considered, these customary confidence 
intervals (for the separate parametric functions) are well known to have more 
or less strong optimum properties, which have also been indicated in [1]. The 
next step in any problem is to consider the intersection of this (infinite) set of 
confidence intervals associated with the corresponding (infinite) set of separate 


or individual parametric functions, and to use this intersection for simultaneous 
confidence interval estimation with a joint confidence coefficient, say 1 — a 
(naturally <1 — 8). Given a, we can determine 8, and vice versa. When we 
start with “good” or ‘‘optimum” intervals for the individual parametric func- 
tions, it is of course important to be able to decide how “good” the resulting 
joint confidence bounds are, either in general or in the particular problems con- 
sidered, especially the multivariate ones, and whether these are in any sense the 
“best.’”’ In this connection all we have done in the present paper is to indicate 
certain operating characteristics of the resultant joint confidence bounds actu- 
ally considered, which we hope to follow up by some further discussion along 
the same lines in a later paper. 

1.2. Let Hy be a hypothesis regarding the parameters, which fixes the value 
of Il, = f,(@) for all k eQ. Thus let I, = My for k ¢ Q if Ho is true. Conversely 
let Tl, = Myo for all k ¢ 2 imply the truth of H,. Then a test of the hypothesis 
Hy is obtained by rejecting Hy) when and only when, at least one of the state- 
ments 


(82:1) duly) S Mo S dioly) keQ 


is false. It is evident that the size of the test is a, since 1 — a@ is the chance 
for the statements (1.2.1) to be simultaneously true. The region Ws, remains 
the same for all sets of parameters 4 = (0, , 92, °** , 9%m) for which Ho is satis- 
fied. To calculate Ws, we can therefore take any set of values for the parameters 
consistent with Hy. The critical region for rejecting Hy is then We, , the com- 
plement of Ws, . The power of the test against an alternative H for which the 
parameter is @ is 


(1.2.2) 1 — Pr {ye We, | 6}. 


2. Applications to univariate simultaneous estimation problems. 
2.1. Let y¥:, y2,°** , Yn be independent normal variates with common vari- 
ance ao (unknown), and let 


E(yi) = a6, + ai282 + +++ + GimOm , ¢m 1,2. --+ mn 


’ 


where 6; , 62, --- , 9m are unknown parameters, and no = rank (a;;) S m < n. 
A linear function II of the parameters 6,;, 62, °-- , @m is said to be linearly 
estimable, if there exists a linear function Y of the variates such that E(Y) = II. 

In this case Y is said to be an unbiased linear estimate of II. The unbiased 
linear estimate with the minimum variance is called the best linear estimate 
of TI. 
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Consider the problem of simultaneous estimation for a set of linear functions 
(2.1.1) TT, = Un10: + lenO2 + +++ + limOm 


such that the coefficient vectors (li, lic, --- , lem) form a vector space V, of 
rank ny S nm. Let 


(2.1.2) Yy, = CrrYa + CroYe + °°* + CanYn 


be the best linear estimate of II, . Then the coefficient vectors (cg: , Cxz , «°° 5 Cin) 


form a vector space V of rank m,, and it is possible to choose n; mutually or- 
thogonal vectors 


(Gi, Jin» *** » Jim), t=m1,2,-++,m 


of unit length lying in V. In the remainder of Section 2.1, we shall suppose the 
subscript 7 to range over the values 1, 2, --- , n,. Let 


Us = gata + gays +++ + ginyn, E(U;) = %;. 
Then there exist constants by: , bio, «+ +, Den, Such that 
Y;, - bal, + bins ~ > binUn, ’ 
TM, = bP, + byte + +++ + Din Pn, - 


Conversely each set of constants by: , bk , «++ 5 ben, determines a unique II, and 
Y,, belonging to (2.1.1) and (2.1.2) respectively, so that the index k is in (1,1) 
correspondence with the set (bi: , Dea, +--+ 5 Deny). 

Also U,, U2,--:, Un, are independently distributed normal variates with 
variance o and 


’ 


9 


V(Vu) = (ia + dis + +++ + dino". 


Let s° be an independent estimate of o based on nz degrees of freedom. Then’ 
an estimate of V(Y;) is given by 


VV) = (dix + bez + +++ + Din, )8*. 


Let us set 


f 


Wk bi (Uy = ®,) + byo( Ue = ®,) + 7 + bin, (Un, —s ®,,) ; 


8 V bin + bis + +> + bin 


(Y;, anal 
then 
(2.1.4) <y<sd 


implies 


.-aV0(¥) <™m < Ye dV 0%) 
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since 
(mi \3 
supve = +435 (U; - &,)?/s°\ 
i=l } 


and 


inf, y; = = 12 (U; x ®,)*/s° \ 
) 


t=1 


It follows from the remark at the end of Section 1.1, that Ws, the intersection 
of the regions (2.1.4), is given by 


\ 


(2.1.5) W, = {> (U; — &)*/s* S d’}. 


Now DU: — ,)°/ms° is distributed as F with degrees of freedom n;, ne. 
Hence if we putd = Vn,F, where FP, = F (nm, ne) is the upper a-point of the 
F-distribution with n; , 2 degrees of freedom, then the chance for y; , y2, °°: 
to lie in W, is 1 — a. Hence we get the simultaneous confidence intervals 


9 Un 


(2.1.6) Y, — VnF.V(Y,) <M S ¥en + Vn .V(Y;) 


with confidence coefficient 1 — a, for the set of parametric functions (2.1.1). 
This is essentially Scheffé’s [2] result when expressed in the general linear form. 
It should be noted that the confidence intervals (2.1.6) are independent of the 
linear functions U;. 

Again suppose we wish to test the hypothesis H,, that any mn, independent 
linear functions belonging to the set (2.1.1) vanish. This is equivalent to the 
vanishing of ®;,7 = 1, 2,--+, ,. It follows from Section 1.2, that a test of 
the hypothesis Hy is obtained by using the region of rejection 


(2.1.7) >, Ui/ns* > Fe. 


Thus we get the usual F-test of the hypothesis Ho . 
2.2. Let 41, y2, +++, Yn be normal variates for which 


2.2.1 E(y;) = 0;, var (yi) =o ; 1,2,+++# 


9 =» 


Dee cov (yi, 43) = po” 7 coon, t#ij 


oe ‘ . 2 2 
where p is known, m; and o are unknown, but an independent estimate s° of ¢ 
based on »’ degrees of freedom is available. It is required to obtain a simul- 
taneous estimate of the mean differences 


(2.2.3) 6; — 6; 3 seen, tf. 
In contradistinetion to the example considered in Section 2.1, we have now 
a finite set of parametric functions. Let z; + x8 = y: + xg where 


G = (yt ve tees + yn)/n, 8 = (0 + 62+ --- + 6,)/n and the disposable 
constant x is so adjusted that the z,’s are uncorrelated. Then 


(2.2.4) E(z;) = 6;, var (z;) = o°(1 — p) 
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implies 


(2.27) yi- yi — dV1—ps 8; < yi — yj tsdVi — p- 


Let Ws be the intersection of the regions (2.2.6). Then clearly the necessary 


and sufficient condition for the sample point to lie in Ws is that 
(2.2.8) = —— <d 


where 


(229) w= sup, (z; . 1 1.7 D2 -+- ns if 4. 

Thus if we set d = ga(n, n’), where qa(n, n’) is the upper a-point of the dis- 
tribution of the studentized range with n, n’ degrees of freedom, that is the ratio 
of the range of n independent normal variates with zero mean to the square root 
of an independent estimate of their common variance based on n’ degrees of 
freedom, then the required simultaneous confidence intervals for the parametric 
functions (2.2.3) are 


(2.2.10) yi — Yi — SJaln, n’) V1 — p00; — 6; Us = 05 8Ja(N, n')V1 — p. 


This result is due to Tukey [3]. In particular y,, ye, ---, y, may be the 
means of n random samples of equal size drawn from normal populations with 
a common (unknown) variance, or may be the estimated treatment effects in a 
randomized block or a balanced incomplete block experiment. 

We can test the hypothesis 1 that 


(2.2.11) (i,=%=--:- 0, 
by using as the region of rejection 


R : 

mate > da(n, n’) 
where R = sup,,; | / /; is the range of the random variates 1, Y2,°** , Yn- 
Thus we arrive at a test different from the classical analysis of variance test. 
2.3. In factorial experiments we are usually interested in estimating linear 
functions of treatment effects, whose estimates are independently and normally 
distributed with a common variance, which can be independently estimated by 
an appropriate multiple of the error mean square in the analysis of variance. 
The distribution needed for simultaneous estimation in this case, is slightly dif- 

ferent from that occurring in Section 2.2 
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Suppose, for example, that we have observations for a 2 K 2 K 2 X& 2 fac- 
torial experiment with factors A, B, C, D, and that we are interested in simul- 
taneously estimating the main effects and two factor interactions only. We shall 
suppose that the experiment is so laid out that none of these is confounded in 
any replication. Let ft); , to , ts; , ts; denote the true main effects and fp, tis , ti, 
tos , tos, tys the true two factor interactions. The order of the subscripts in ¢;; 
is immaterial, that is, ¢;; = ¢;;. We can then write in the usual notation 


(2.3.1) i, = lé(a — 1)(b + 1)(e + 1)(d + 1) 
23:2) te = le(a — 1)(b — 1)(e + 1)(d + 1) 


with similar expressions for other main effects and interactions. Let y;; be the 
estimate of ¢;;. Then reasoning as before we get the following simultaneous con- 
fidence intervals for ¢ 


(2.3.3) Yij — Sta(n, n’) S lis S yi; + sx.(n, n’) 


where s is an estimate of V(y;,), based on n’ degrees of freedom available for 
the estimate of error, and where n, which is 10 in this particular example, is the 
number of linear functions to be estimated. 

The meaning of x,(n, n’) is as follows. Let x1, x2, --- , x, be independent 
normal variates with zero mean and variance o. Let |x be the maximum of 
wt, ,{)a2),-**, a, and let s be an independent estimate of « based on n’ 
degrees of freedom. Then x,(n, ’) is the upper a-point of the distribution 
of | 2z\/s 


A test of the hypothesis 7, that all the linear functions ¢;; to be estimated 
are simultaneously zero, is obtained by using as the region of rejection 


(2.3.4) supi,; | Yij| 2 Sta(n, n’). 


In a factorial experiment in which each factor is at more than two levels, the 
above result will still apply if the linear functions to be simultaneously esti- 
mated (or tested for vanishing) are so chosen that their estimates are inde- 
pendently distributed with a common variance. 

The use of x,(n, n’) to solve an equivalent problem was introduced inde- 
pendently by J. W. Tukey at the same session of the meeting of the Institute 
of Mathematical Statistics (Chicago, 1952) at which the authors first presented 
their own results. 


3. Notation and preliminaries for multivariate applications. As far as possible 
Greek letters will stand for population parameters and [talic letters over the 
first half of the alphabet for given (nonstochastic) quantities and over the latter 
part from, say, s to the end for sample quantities, capital letters for matrices, 
small letters for scalars, small letters underscored for column vectors and for 
row vectors if they are primed. Some exceptions to this, which are unavoidable, 
will be clearly indicated at the proper places. As usual the transpose of a matrix 
or a column vector will be denoted by priming such quantities. The absolute 
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value of the determinant of a square matrix .WV will be denoted by | V | and the 
absolute value of a scalar m by | m! . To indicate the structure, a p X q matrix, 
say M, or a p X 1 column vector, say m, will sometimes be written respectively 
as M(p X gq) or m(p X 1). The terms “positive definite’ and “positive semi- 
definite” will be abbreviated p.d. and p.s.d. respectively. ‘“Almost everywhere,”’ 
that is “except for a set of (probability) measure zero” will be referred to as a.e. 
A matrix B whose typical element is 6;; will sometimes be denoted by (b;;). A 
diagonal matrix whose diagonal elements are, say, a, @2,--* , @» will be de- 
noted by D, . 

A normal variate x with mean £ and variance o° will be called N(é, 0°). A 
column vector z(p X 1) whose components have a p-variate normal distribu- 
tion about a mean vector ¢(p X 1) and with a covariance matrix =(p X p) will 
be called N(é, =). The matrix = is a symmetric and always at least a p.s.d. 
matrix. In the problems we shall be discussing in this paper this Y will be as- 
sumed to be p.d. A random sample X(p X (n + 1)) of (n + 1) individuals from 
an N(é, =), will have the probability density 


(Qe) 7° 2 | > | —(n+1)/2 exp [—3 tr > “X _ £)(X’ a. t’)] 


where i(p X (n + 1)) stands for a p X (n + 1) matrix each column of which 
is the p X 1 vector é already defined. Notice that in the matrix X any element 
in the ith row and jth column is to be called x;; where i = 1, 2,---, p and 
j = 1,2,-++," + 1 and where 7 stands for a variate and j for an individual. 
A matrix X having the above probability law will be called an X:N(é, =). Also 
let ; be the mean over j of x;; and let x’ = (#,,--- , #,). It is well known that 
by an orthogonal transformation we can change over from X(px(n + 1)) to 


(7. V/n+1 x), where 
YY’ = nS(p X p) = XX’ — (n+ 1)zr’, 
S being the sample covariance matrix, and where 
Y(p X n) and z(p X 1) 

are independent and have the respective probability densities 

Const. exp [— } tr 'YY’] 
and 

Const. exp [— 3 tr 3 '(n + 1)(x — €)(2’ — &’)). 


Any Y¥(p X n) having the above distribution can be called Y:N(O, =). For 
problems on covariance matrices or canonical correlations or regressions we 
shall start not from X(p X (n + 1)):N(é, 2), but directly from Y(p X n): 
N(O, =). As is well known there is a lot of arbitrariness in Y, but this does not 
matter in the results we are ordinarily interested in, because all such results 
ultimately come out in terms of z and YY’, that is, S. In Sections 3, 4 and 5 of 
this paper which, in a sense, constitute a follow-up of a previous paper [1], re- 
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peated use is made of the fact that if z(p X 1) is N(&é, Y), then, for a fixed, that 
is, nonstochastic a(p X 1), the scalar a’z is N(a’é, a’Ya), and thus multivariate 
problems are thrown back on univariate and bivariate problems exactly in the 
same manner as in the previous paper. We also make repeated use of the result 
that tr A(p X q) Bla X p) = tr BA. 


4. Multivariate estimation and testing problems on means. In three subsec- 
tions under this section we shall consider three estimation problems each coupled 
with a corresponding problem in testing hypotheses. It will be evident from the 
titles to the subsections that the first problem is, in a sense, a special case of the 
second and the second of the third. But for expository purposes and from con- 
siderations of practical usefulness there is an advantage in discussing the three 
cases separately and in order of increasing generality and difficulty. It may 
be also noted that so far as testing of hypotheses is concerned, out of the three 
major problems considered in Subsection 4.1, 4.2 and 4.5 of this section the 
last two have been already discussed in a previous paper [1] and the associated 
tests offered there are precisely the same as are obtained here by inverting the 
confidence estimation procedures. In the discussion of the estimation problems 
we shall be concerned with the probabilities of covering both the true and false 
values of the parameters being estimated. We shall refer to these as the probabili- 
ties under the null hypothesis and an alternative respectively, and shall employ 
the same terminology for the associated distributions of the statistics that 
define the boundaries of the confidence sets. s 

4.1. Estimation and testing problem on & from an N(é, =). Given an X(p X 
(n + 1)):N(é, 2), suppose we try to obtain simultaneous confidence bounds on 


arbitrary linear compounds of the population mean vector £ Consider the 
statement that 


(n + 1)'| a’(x — €)| /(a’Sa)' S c, 
or 


(4.1.1) (n + l)a’(x — &)(x’ — £’)a/a’Sa Sc, 


where z is the sample mean vector and S is the sample covariance matrix, already 
defined in Section 3, and a(p X 1) is an arbitrary nonnull nonstochastic column 
vector and ¢ is a given positive constant. The statement (4.1.1) stems from the 
customary Student’s ¢-test and the associated confidence interval (both having 
well known optimum properties) relating to the parameter a’t. Now, for a given 
(positive) c and given x, &, S and of course n, the set of all statements (4.1.1) 
for all possible nonnull vectors a is exactly equivalent to the statement that 


(4.1.2) supa(n + l)a’(x — §)(x’ — ?’)a/a’Sa S ce’. 


It is well known that this “sup” comes out as tr(n + 1)S‘(r — &)(z’ — &’), or 
as tr(n + 1)(2’ — #’)S'(x — £) (since tr AB = tr BA), or simply as (n + 1) 
(x’ — ¢’)S"‘(x — €) (since tr sealar = scalar). It is also well known that under 


the null hypothesis, this is distributed as the central Hotelling’s T’ with D. F. p 
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and n + 1 — pand that when in this statistic ¢ is replaced by §*(+ &), the re- 
sulting statistic is distributed as the noncentral Hotelling’s T° with the same 
D. F. and with the noncentrality parameter +> = (¢* — &’) D0 '(é* — £). Going 
back to (4.1.1) it is thus easy to see that if, for all — and all nonnull a, 


FE l Lat _ 2 */ 
(4.1.3) P E la (zx £*)(2" .. Ja <¢'| g* = | =1—-—a, 


a i s 
a’Sa 





thenc’ = 7, is the upper a-point of the central Hotelling’s T°-distribution with 
D. F. p andn + 1 — pand can be conveniently written as T2(p,n + 1 — p). 
From (4.1.3) we have thus, with a confidence coefficient 1 — a, the set of simul- 
taneous or multiple confidence bounds (for all € and all nonnull a): 


(4.1.4) a'r — [T(a’Sa)/n + 1} S wvisaxr+t+ [T.(a’Sa)/n + 1)}}. 


It should be noted that (4.1.4) gives the simultaneous confidence bounds on all 
arbitrary linear compounds of the p components of the population mean vector €. 
The shortness (in the sense of probability) of this set of confidence boun °, 
that is, the probability of these bounds covering &* when, in fact, &* # &, is 
obviously 


9 


<< malt «3 
SS 2a 


1 — P{noncentral T 


From the well known fact that the power function of Hotelling’s 7-test is a 
monotonically increasing function of the nonnegative 7, it follows, therefore, 
that the shortness of the confidence bound (4.1.4) tends to zero as r > &. 

From 1.2 the critical region of the associated hypothesis: £ = (a particular) 
t), that is, of the hypothesis: M.(a’é = a’&) turns out to be: 


, \ oly aa 
(n+ 1)\(x —&)S (@ — &) 2T., 


which implies that, for at least one a, the set of confidence bounds (2.1.4) does 
not include a’g ; the region of acceptance based on the opposite inequality will 
imply that, for all a, the set of bounds (4.1.4) includes a’é . 

4.2. Estimation and testing problem on mean differences from 


N(&, =)(h = 1,2, +++, k). 


Given X,(p X (mm + 1):N(&, 2), (A = 1, 2,--- , &) let us try to obtain a 
set of simultaneous confidence bounds on all arbitrary double linear compounds 
of the p-components of the & population mean vectors measured from the 
weighted grand mean vector. Consider now the statement 


(4.2.1) >. b,a’(n, + 1)*(2, —axr-—é jis [Ck —- 1 ca’ Sa}! 
heul 


where x, is the mean vector for the Ath sample, 


k 


K 4K k 
r= >. (nr + an / 7 (nap + ‘ g = > 2 (nr ad Lg, > (np + 1), 
h=1 h=1 


han} ; 


1 
. 
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where S is the pooled “within” covariance matrix of the k-samples, given by 


k k 
1 ru of ; , 
(x ns) s = z [X,Xn — (mn, + 1) raz), 
h=1 h=1 
and c is a given positive constant, a(p X 1) is an arbitrary nonnull nonstochastic 
] ’ . ~ s . 5‘ 2 
eolumn vector and the },’s are arbitrary coefficients subject to2 J,.: 6, = 1. 
If we now use the result that 


k 

5 hy|Stvdeea= Lys, 

h=l | h=1 
then it directly follows that, given all the other quantities including a, and under 

a a : ‘ ‘ k 2 , a 
all possible variations of b,’s subject to Doi. b, = 1, the statement (4.2.1) is 
precisely equivalent to the statement that 
k 


> fa’(nm + Ia — 2 — && + ONE — Va’Sa 


h=1 


or 


. 

‘ ’ a ‘ ro 2 

(4.2.2) - a’(nn + 1)(tn — 2 — En + £) (rp — 27’ — & (k — l)a’Sa Se 
h=1 


Letting now a vary and putting 


/ . . ; 2 a , t , a? \ 
(k —1)S8* = Di (m+ D(m — 2 - Mna—-zr —-& + 8), 
h=1 
the statement (4.2.2), for all possible values of the nonnull a, is precisely equiva- 
lent to: 


9 


(4.2.3) sup,la’S*a/a’Sa| Sc. 


As observed in a previous paper [1] S is, a.e., p.d. and S* is a.e., p.s.d. of rank 
q = min (p, k — 1) (p.s.d. if p > k — 1 and pd. if p S k — 1) and sup, 
{a’.S*a/a’ Sa] is just the largest root 6, of the pth degree determinantal equation 
in 6: | S* — 6S | = 0. Of this equation all roots are nonnegative, p — q of them 
always zero and q are, a.e., positive. Thus (4.2.3) and hence (4.2.2) and (4.2.1) 
under all permissible variations of a@ and the b,’s, turns out to be equivalent to: 


(4.2.4) 6,5. 


The distribution of @, on the null hypothesis is known and relatively easy and 
‘ k . ‘ se 
involves as parameters p, k — 1, nat h,. Computation of the 5 per cent 


and | per cent points is in progress. Thus if 
(4.2.5) P{@, S 9_.| null hypothesis] = 1 — a, 


. k ia ‘ i ie 
we can write 0. = 6.(p, k — 1, > Ant m,.), and now combining (4.2.1)-(4.2.5), 
we have, with a confidence coefficient 1 — a, the following set of multiple 
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confidence statements (for all &,’s, all nonnull a’s and all b,’s subject to 


> A= bi = 1): 


k 
> bra’(ms + 1)'(xn — x) — [(k — 1)0,0’Sal? 


h=1 


(4.2.6) : ; ; - | ~ 
< Vha'(m + 1) -— 4) Ss a bra’(m, + 1)X2n — 2) 
h=1 


h=1 


+ [(k — 1)0,a'Sal}, 


where 04 = 9a(p, k — 1, Ps n,). This gives simultaneous confidence bounds 
on all arbitrary double linear compounds of the p components of the difference 
between the k population mean vectors &’s and the weighted grand mean of 
these which is & 

To discuss the shortness of (4.2.6) consider the noncentral distribution of 
2 of (4.2.6), that is, the distribution of the statistic y, obtained by replacing, 
in 6, & by &* (# &). This distribution is extremely difficult but is well known to 
involve as parameters, besides the D. F.’s, the positive roots 0; , 62, ---,O. 
(s S min (p, k — 1)) of the determinantal equation in 6: | =* — OS | = 0. Here 
» is the common covariance matrix of the k populations and =* = (k — 1)" 
Dian (ma + (ER — &* — && + EE — & — & + &). This 3* is necessarily 
at least p.s.d. of rank < min (p, k — 1), = s(say), so that, out of the p roots of 
the equation in 9, p — s are zero and s positive. If now we write formally, when 
the probability is computed under an alternative 


k k 
(4.2.7) P| ¥ <= 6, (pk - -1,2 ms) | = (a, p,k — 1, Mn, 91, 02,°°° 0x), 


n 


then we note that while y is difficult to obtain, a good upper bound to it [1] is 
given by 


(4.2.8) wy < [P(central F < @,)]’ Il P{noncentral F < @,! 01, --- , Qcl, 
i=l 


where all F’s are on D. F. (k — 1) and os n,. Furthermore, as stated and 
proved elsewhere [1], this ¥ is also a monotonically decreasing function of the 
deviation parameters and tends to zero as these tend to infinity. 

With two populations (and samples), we have g = min (p, 1) = 1, and thus 
only one positive sample root, say @, and at the most one positive population 
root, sav 0. It is easy to check that in this case 


= Mt Det Dy g-H,, 


nm +m+2 
(ny + 1)(m2 + 1) 


Oar Sacecheptetetnege: 


m+m+2 


and it is well known that, on the null hypothesis, @ is distributed as central 
Hotelling T° with D. F. p and n; + nm. + 1 — p, and on the alternatives as non- 
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central Hotelling 7° with the same D. F. and with a deviation parameter 0. It 
is also easy to check that in this case the confidence statement (4.2.6) reduces to 


ean a 
a (x1 — 2) la + 1)(m + 1) 


’ m+n +2 2 ta P 

S a(x — 22) + = T'.a'Sa| ‘ 

where T;, = T.(p, nm + no + 1 — p) is the upper a-point of Hotelling’s T°. The 

shortness of (4.2.10) which is now a degenerate form of (4.2.7) is exactly known 

and of course tends to zero as9 > ©. 

From Section 1.2, the critical region of the associated hypothesis & = & = 

- = &, that is, of the hypothesis Na(a’t; = a’t)(i = 1, 2,---, k), turns out 
to be the same as given in a previous paper, namely: 


k 
(4.2.11) dq = 6a (», k ri 1, Zz. ni) 


i=1 


; 
2 Y fy 
Te o'Sa | S a(& — &) 


(4.2.10) 


with a power function 1 — (a, p, k — 1, p ni, ,-+-+ ,®,) where ¢, is the 
largest characteristic root of (k — 1)7? S* Du‘, (ni + 1)(2; — 2)(2’ — 2’) and 


where the ®’s are the roots of the equation in ®: 
| k | 
| -1" aX (ms + 1)(& — &)(t — £') — OE] = 0. 


The properties of this power function, such as indicated under (4.2.8), have. 
been already discussed in [1]. 

4.3. An important subset of the set of bounds (4.2.6). Suppose now that, instead 
of all contrasts of the type: Doha baa’(ma + 1)*(& — &) (with the given restric- 
tions on a and the b’s), we are interested in contrasts of the type: a’(& — &:), 
for all nonnull a’ and all h # 1 = 1, 2,--- , k. It is easy to offer a multiple set 
of confidence bounds for contrasts of this type, which can be regarded as one 
kind of multivariate (under unequal sample sizes) analogue of a somewhat 
similar set given by Tukey for the corresponding univariate situations, and dis- 
cussed in Section 2 of this paper. The proposed set is built up as follows. With 
the same notation as before, and with ma: = (m, + 1) (nr + 1)/(m + mi + 2) 
note that 


m2 / , , ee | 
Thr = Mare (Te — Zi — & + E)S” (Ga — Zi — Ex + ED 
, / , i 
= mya’(tn — Li — En + E1)(Ze — Ti — Ew + E1)a/a’Sa. 


Thus, for a given pair (h, 1), the statement that Tht < Ta is exactly equivalent 
to the statement that, for all nonnull a’s, 


a’(t, — 21) — [T'a’Sa/nni}' S a’ (& — &:) S a'(aa — 2) 
+ [T2a’Sa/nni}’. 


We observe that when the true population means are é’s, Ti: is distributed as 
Hotelling’s 7° with D. F. p and Dufay, + 1 — p. 
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Now, considering all pairs (h, 1) out of k samples (and k populations), it is 
easy to see that the statement: all Tis S T.. is precisely equivalent to the 
statement that the largest Ti, out of all pairs is S T., which again is equivalent 
to the statement that, for all nonnull a’s and all pairs (h, 1) out of k, 


(4.3.1) a’(t, — 22) — [Ta’Sa/nm}' S a'(fs — £2) 
< a’ (xp — IZ — [T’,a'Sa nal’. 


If the confidence coefficient of (4.3.1) is to be 1 — a, then T, 
T.(p, Mime, +++, nm) will be given by 


(4.3.2) P| Largest 7), out of ” pars 2 7, null a = a. 


It will be obvious that the distribution of the largest 7; involves as parameters 
just p and m, m2, --- , nm. It is easy to see that the distribution is manageable 
only when the number of parameters is small. In particular, the case that n; = 
Ng = +++ = m and p = i, is identical with the one considered in Section 2.2. 
It may also be noted that when k = 2, the largest 7),) will of course be Hotelling’s 
T° distributed with D. F. p and n, + nz. + 1 — p. Also the shortness of the con- 
fidence bounds (4.3.1) can be formally written as 

- 


| 


m2 i k . al ‘ 
P | Largest 7,1 out of (5) pairs S T'4(p, mi, 2, °-* , m) | alternative 


as 
It is important to observe that while each 7); is individually distributed (on 
the null hypothesis) as a central Hotelling’s T with D. F. p and mw mm + 


1 — p, the (5) T,:’s are not independent, nor do we know what the distribution 


of the largest central 7',; is, to say nothing of the noncentral case, so that the 
confidence statement (4.3.1) has not been reduced to concrete terms as was done 
for the other cases discussed in this paper. The distribution problem arising in 
this situation is now under investigation. 

For the associated problem of testing Ho:f = --- = &, we set up as before, 
the rule that if, for all nonnull @ and all pairs (h, l), the bounds (4.3.1) include 
zero, we accept Ho and reject it otherwise. The properties (including power) of 
this test is tied up in an obvious manner with those of the multiple confidence 
interval statement (4.3.1). 

Notice that so far, in testing of hypotheses by inversion of confidence state- 
ments, we have considered two-decision problems. Suppose, at this point, for 
purposes of illustration, we offer a multi-decision procedure, namely that, for 
a given pair (h, /), we accept or reject H(& = £,) according as all those bounds 
(4.3.1) which involve x, and x; only include or exclude zero. It is obvious that 
in all the other situations considered so far we could set up similar multi-decision 
procedures. 


4.4. Further observations. In many situations it might be of greater physical 
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interest to be able to make, instead of (4.2.6) or even of (4.3.1), a set of just 


p 1 confidence interval statements, each relating to just one variate and 


: SRY ; : 
difference between one of » ) pairs. In other words, if & = (gia, fa, °°* 5 Epn) 


(h = 1, 2,--+, k) denote the p means for the Ath population, then we would 
like to make a statement of the form 


(4.4.1) finn (Xr, Xo, +++, Xe) S Ein — Ese S Pyne (Xi, Xo, --- , Xu) 


(with obvious applications to subsections 2.1 and 2.2), for allh # h’ = 1,2,---, 
k and allj = 1, 2, +--+, p, where fj, and Fj, are supposed to be two different 
functions of the whole set of p X 2 ied (n, + 1) raw observations. It is clear 
that (4.4.1) is a subset of (4.3.1) which again is a subset of (4.2.6). Whether it 
is possible to make a statement like (4.4.1) in an elegant and useful way (i.e., 
with manageable functions fj, and Fj,,-) and with a given joint confidence 
coefficient 1 — a, that is, free of the nuisance parameters &, is still an open 
question. It may well be that a range (not too wide) for the confidence coefficient 
itself is called for. Furthermore, whatever set of confidence intervals like 4.4.1 
we propose, be it under a fixed confidence coefficient or under a confidence co- 
efficient lying in a short range, the “‘goodness’’ of such a set would pose further 
questions. The authors believe that in this situation a more promising approach 
may be one involving a suitable two-stage procedure. 

4.5. General linear hypothesis and linear estimation. In place of the setup of 
subsection 4.2, let us consider the following more general one. Suppose we have 
a matrix X(p X n), consisting of n independently distributed p-dimensional 
column vectors 2}, --+ , 2, , each being a multinormal with the same covariance 
matrix 2. Suppose, further, that E(X) = &(p X m)B(m X n), (m < n), where 
B is a given (nonstochastic) matrix of rank mo S m and &(p X m) is a set of un- 
known parameters. Suppose now that under this model we are interested in 
the problem of multiple or simultaneous estimation of a set of estimable linear 
vector parameters &(p X m)l(m X 1), for alll in a vector space of rank r S no S 


m <n. Also let rg; = X(p X n)e(n X 1), be the best linear estimate of 
(notice that ¢ can be obtained in terms of B and lI and that the estimate of the 
covariance matrix of r,,: to be called Sg: , is also available in terms of B and 
land the p X nx matrix of observations X). Thus, given B of rank m S m < n, 
we have, for all nonnull p-column-vectors @ and all estimable linear functions 
él (for the l’s under consideration), by using the techniques of the previous 
sections, the set of simultaneous confidence interval statements (with confidence 
coefficient 1 — a): 


(4.5.1) a’zea — [r6.a'Sp.a}’ < a’tl < a’ze,1 + [r0.0'Se.2a]', 


where 0. = 6a(p, 7, N — mo) is defined in terms of the relevant parameters ex- 
actly the same way as in subsection 4.2. The tie-up of (4.5.1) with the univariate 
confidence bounds given in (2.1.6) of Section 2.1. will be obvious. 
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The inverse problem of testing of hypothesis would go through exactly the 
same way as in subsection 4.2 and need not be separately considered here. 


5. Multivariate estimation and testing problems on covariance matrices. 

5.1. Problem on > from an N(£, =). As suggested in Section 3, let us start from 
a Y(p X n):N(O, =), where =(p X p) is supposed to be p.d. (so that its character- 
istic roots are all positive). For simplicity we also assume that p S n, so that, 
a.e., YY’, that is, nS is p.d., and hence all its characteristic roots are positive. 
We now recall the well known result that there exists an orthogonal T(p X p) 
such that =(p X p) = T'(p X p) Deo(p X p)I’(p X p) where the 0’s are the 
characteristic roots of 2. If the roots are distinct then by a convention, say by 
taking all the elements of the first row of T to be positive, the transformation 
could be made one-to-one. However, we do not need this for our present purpose. 
Note that the number of independent elements on both sides is the same. We 
shall discuss the estimation and testing problems not in terms of = but in terms 
the equivalent set T and 0. Except for the factor (— 4) the argument under the 


exponential in the probability density of Y can now be written, if we put A = 
eo 
@ * as 


tr (TDeI’) YY’ = tr TDsDsT’YY’ = tr (DsT’Y)(DsI’Y)’. 
If we put Z = D,T’Y, it is easy to check that the probability density of Z is 
(5.1.1) [2x]?"” exp — } tr ZZ’. 


Let us now try to obtain a set of simultaneous confidence bounds on a class of 
arbitrary p.d. quadratic functions of the elements of the population matrix 
DV’ (to be brought out in 5.1.5). For all nonnull nonstochastic a(p X 1) con- 
sider now the simultaneous statement that 


9 


(5.1.2) oc S a’ZZ'a/a'a S ce: orc, S a’ (DsY’YY'TDy)a/a’'a S 2. 


This statement, for a given Z and c; and ¢ is precisely equivalent to the state- 
ment that 
19 177! 
ag eee a aZZ'a , 
< inf, —— Ss sup, —— 8 a, 
a’a aa 


or that 


(5.1.3) 156459486: 


where @, and 6, are the smallest and largest characteristic roots of the matrix 
ZZ’, both, a.e., positive. The relevant distributions on the null hypothesis, that 
is, when the true population matrix is =, being known, let us determine c; and 
c2 from the relations 

Pas485%46¢: and 


(5.1.4) J 
P(cr S | 5) = P( Sc: 


We can write cj and ¢ as Aia(p, n) and O.4(p, n). If we now tie up (5.1.2), (5.1.3) 
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and (5.1.4) we have, with a confidence coefficient 1 — a, the set of multiple or 
simultaneous confidence interval statements for all nonnull a and all permissible 
values of the unknown parameters [ and @: 


(5.1.5) a’'aby,(p, n) < a'(DsI’YY'TDy)a ¥ 
or, remembering that nS = YY’, 


a’aba(p,n) <= a'(DsT’nSTD,)a S a’abea(p, n). 


The confidence statement is on the parametric matrix D,I’ which, as will be 
presently seen, plays the same part as ¢ in univariate problems. Furthermore, 
we note that (5.1.5) gives a set of simultaneous confidence bounds on a class of 
arbitrary p.d. quadratic functions of the elements of the population matrix 
DT’ such that the elements of the observed sample covariance matrix S also 
enter into the coefficients of the quadratic functions. Note that when p = 1, 
that is, in the univariate case, T = I’ = 1 (with the convention we are using), 
> =o0,D, =¢,a' = a = ascalar, so that (5.1.5) will reduce to 
(5.1.6) la i ns/o < Xia or ns’/xia >oe ans X20 
where xia and x2 are just the lower and upper a/2-points of x° with n D. F. 
It is easy to see by inversion of (5.1.5) that for the associated hypothesis 
HS = 3) = ToDo,To (say), we have the critical region: 


(5.1.7) dp = O2.a(p, n) and/or gd: S Aap, n), 


where ¢, and ¢; are the largest and smallest characteristic roots of the matrix 
DsToYY’T.Ds . The shortness of the confidence bounds (5.1.5) is tied up with 
the power of (5.1.7) and the general nature and properties of this have been 
already indicated in a previous paper [1]. 

5.2. Problem of comparison between 2; and 2 from N(é&, X1) and N(é&, Xs). 
Let us start from Y,(p x ni):N(O, Xi) (¢ = 1, 2), where we assume that p S 
nm,, m2, and that S, and 2 are both p.d. so aa the characteristic roots of 2,22 
are all positive and aon, of Yi:¥Yi(¥2¥2)", that is, of (m;/n2)S,Sz' are, «.e., all 
aneiegh We recall that there exists a nonsingular u(p X p) such that D1 = uwDen’ 
and S. = uu’, where the 0’s are the characteristic roots of ©,Z7". If these roots 
are distinct, then by a convention, say taking all the elements of the first row 
of uw to be positive, the transformation could be made one-to-one. Noting that 
the number of independent elements on both sides is the same we shall = in 
terms of » and the @’s, instead of =, and X,. (As in Section 5.1 we put A = 2 
Except for the factor (—1/2) the sania under the exponential in the a 
ability density of Y; and Y: can be written as 


r [(uDou’) “Vi + (up’) ¥2¥3] 


—1 


(5.2.1) 
= tr [(Dau¥1)(Daw"¥1)’ + (uw ¥2)(u'¥2)']. 


If we now put Z; = Dyw'Y, and Z, = pw "Ye, it is ex asy to check that the prob- 
ability density of Z, and Z, is 


(2x)? 1"? oxy [—4tr (ZZ; + Z2Z2)]. 
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We shall now obtain (see (5.2.4)) a set of simultaneous confidence bounds on a 
class of arbitrary p.d. quadratic functions of the elements of the population 
matrix wJ,u . For all nonnull nonstochastie a(p X 1) consider the set of state- 
ments 


9° 


2 a ee ee 
(1 S$ a@Z,2,0a/adZ2.Z2.a € or 
2 abbey wep oF ~hep 4g hess 2 
cq S a’(Dapy Yi)(Dau Yi)’a/a'(u Y2)(u OV 2)’a S co or 


N2 2 , —lq 9] y —lq ;-l N2 2 

—c Sa (Dau Sip’ Da)a/a’(u Sew” )a S -- c. 
ny 

For given Z,, Z2, ¢; and c2 this statement is precisely equivalent to the state- 

ment that 


5 % ’ 
- a’Z,Zia 
8 la = 
‘7. 


or 

(5.2.3) ca <4 

where 6; and @, are the smallest and largest characteristic roots of the matrix 
(Z,Z;)(Z.Z) *, both, a.e., positive. The relevant distributions on the null hy- 
pothesis (i.e., when the true population matrices are 2; and Lz) being known, 
let us determine cj and c: from the relations formally similar to (5.1.4) and 
write ¢; and c: as Aia(p, Mi, M2) and O24(p, mi, 2), remembering that these 4 
and 42. Will be different in form from those given in (5.1.4). If we now tie up 
(5.2.2) and (5.2.3) and put au’ = b’, we have (with a confidence coefficient 
1 — a), the set of simultaneous confidence interval statements for all nonnull 
b and all permissible values of the unknown parameters u and 0: 


Ne ’ rf —lq );-! Ne ‘ ’ 
~~ Bia(p, mi, n2)b'Sob S b'(uDaw Siu’ Dau')b S — Ora(p, mi, N2)b’ Seb. 


ny ny 


1 


The confidence statement relates to the parametric matrix uDsu which, as 
will be noticed presently, plays the same part as o2/0; in univariate problems. 
It may be observed that (5.2.4) gives a set of confidence bounds on a class of 
arbitrary p.d. quadratic functions of the elements of the population matrix 
uDasu' such that the elements of the observed sample matrix S; also enter into 
the coefficients of the quadratic functions. As in the previous case, note that 
when p = 1, b = b’ = a sealar, 5; = 0}, 22 = 02 (both scalars), Dy = 02/01 
and uDsu = o2/0, , so that (5.2.4) reduces to 


sis ca | 


i) 2. 2, ~ Fz! 2,3 
(O.2.0) T1a°81/8 eS 01/02 = 2a°81/ 8: 


where F;, and F2, are the lower and upper a/2-points of the F-distribution with 
DD. F. n; and ne. 


It is easy to see by inversion of (5.2.4) that, for the associated hypothesis 
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Ho: =, = S2 which turns up if and only if Ds = I(p), we have the critical region 
obtained in the previous paper [1], namely, 


(5.2.6) dp 2 Oa(p, mi, 2) and/or ¢: S Aa(p, m1, ne), 


where ¢, and ¢; are the largest and smallest characteristic roots of the matrix 
(eV Yue’ )(e '¥2You’), and hence of uw (¥,¥1)(Y2¥s) ‘a, 
or of (¥i¥1)(¥e¥2) or finally, of (my n2)S,Sz'. 

The shortness of the confidence bounds (5.2.4) is tied up with the power of 
(5.2.6) which already has been discussed in [1]. 

5.3. Some consequences of (5.1.5) and (5.2.4). From the confidence statements 
(5.1.2) and (5.2.4) a whole chain of results follows from the following set of 
theorems (the proof of which is obvious): if z’Az and z’Bz are two p.d. quadratic 
forms such that z’Azr 2 2’Br for all z, then (a) the roots in @ of: | A — 6B | = 0 
are all real and 21, (b) y’B'y = y’A‘y for all y, (ec) |A| 2 | BI, (d) any 
principal minor of A is greater than or equal to the corresponding principal minor 
of B and (e) any principal minor of B™ is greater than or equal to the correspond- 
ing principal minor of A~’. When these are applied to (5.1.5) or (5.2.4) one 
obtains 


nS! =>\|2| = (00)? | nS}, 


’ = a 
mis, ai |. ax 1 Ms 
——— 2 = (624) ” ——— 


Ny S» 2 


| N2Se 
Further consequences will be given in a later paper. 


6. Multivariate estimation and testing problems on “‘association’”’ parameters 
6.1. Problem on the regression coefficient in a bivariate normal population. Let 
two variates x, and x2 be distributed as a bivariate normal with variances oj 


and o2 and correlation coefficient p, and let the sample variances (on a sample 


of size n + 1) be denoted by si and s2, and the sample correlation coefficient 
by r. Also let b: = sir/so and 8; = oyp/o2. It is easy to check that then the 


variates (x; — 6,22) and x. are uncorrelated, so that when the population pa- 


(n — 1) D. F. Here r* stands for the sample correlation between (21; — 122) 
and a, that is, 


» » » * i 
(81837 — 382)/(s8i — 28,8182r + 8182)"s2 


(sr — Gy8)/[(syr — Bise) + 1 — r’)s;|" 


» 


(b, — 81) /[(b — Bi)” ur = r’)si 82)", 


and, therefore, 


— . b, — B 
(6.1.2) ri/i—rt=2. 2 —— 
(1 — r°)} 
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Now consider the statement 


(6.1.3) —t.n — 1) S$ Vn — Ir*/V1 — r™ S t,(n — 1), 


where ¢,(n — 1) gives the upper a/2-point of the ¢-distribution with (n — 1) 
D.F. This is easily seen to reduce to the following confidence statement on 8; 
(with a confidence coefficient 1 — a): 

ta(n — 2 ta(n — 1) 24 81 


(6.1.4) bb - “a r = <sh+ va | (l — r) ty 


By inversion of (6.1.4) the test that we obtain for the associated hypothesis 
H,:8,; = 0, that is, p = 0, is easily checked to be the customary test based on 
“»? and hence just the ¢-test. Similar procedures would go through for ‘“‘partial 
regressions” or “multiple regressions.”’ The interesting point here is that it would 
be far more difficult to give corresponding confidence bounds to p, because this 
would have to be done by inverting the distribution of the noncentral r, 
which is quite complicated. 

6.2. Problem on the regression-like parameters in a (p + q)-variate normal popu- 
lation. Let us start from an Y((p + g) X n):N(O, 2), wherep Sq,p+qsSn 
and where © is p.d. and of the form, say, 


so that Yi: and Lz: themselves are also p.d. In this case, all the p population 
e ° . — —le slay! 
canonical correlations, that is, all characteristic roots 0;’s of Di; LpL2 Liz are 
nonnegative and less than 1. If Ly. is of rank s(<p S q), then s of these roots 
are positive and the remaining p — s are zero. We use now the theorem that 
there exist nonsingular ui(p X p) and uo(q X qg) such that 
‘ ’ , 
Zu = wih ; 222 = M22 
and 
s , 
Zn(p X g) = m(p X p)(Dye O)u2(g X Q) 
where Dy is p X p. If Ly is of rank p and the 9,’s (now all positive) are distinct, 
then this transformation could be made one-to-one by taking 
ye F? 
Mar . —? 
p 


and adopting the convention, say, that the elements of the first row of yu, and 
the diagonal elements of jz: are all to be positive. If Yy is of rank s(<p) and 


uwl(q X g) = 


23 «M24 
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the s positive 0,’s are distinct, then this transformation could be made unique 
by taking 
p—-s 
7 8 8 
0 P 


; u(p X p) = (“" a =a. 
0; : ? 
M23 «=p / 8 


0) 


8 q-8s 


a fie\g — § 
uo(q x q) = > 
23 «bMoa/S 


where ~ over a square matrix indicates that all elements above the diagonal 
are zero, and by adopting the convention, say, that the first row of un and the 
diagonals of jfj2 and fz are all positive. We shall not need this uniqueness, but 
we note that with proper forms for 4; and y2 the number of independent elements 
is the same on both sides and we shall work in terms of 4; , w2 and the 0’s and 


~ 


not the >’s. We now put 


nm 


Y,\p wT Yi¥i Yi¥2 Su 
Y = , so that YY = ; ,pHnl, 
Y> q YoY; YoYo Sie 
We next observe that, a.e., YY’ is p.d. (which means that S, and S» are p.d.) 


—— ° il ° OH ’ 1 o/ 
and Sy is of rank p, so that, a.e., all the p characteristic roots of Si Sy Sz Siz 
are > O and < 1. We next note that 


iat ui(Dyo 0) us 


Dys\ , 
bu 
0 
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so that 


I(p) — (Dyer 9) 


Dy, 1/l1-—96 0 
0 
. I(q — p) 


I(p) 0) 


dD, ~~ Dy, i/i-o U 0 
0 0 I(q — p)// 


(“ 0 
oy 


Except for the factor (— 


$), the argument under the exponential in the prob- 
ability density of Y; and Y. can be now written as tr CC’ where 


“i Y; 
Dyeji-e es Dvyii-e 0 ik 
Hi ¥i+ Ke Ys 
0 0 I 
If we put 
r wes ‘ Dyoji- a Dyin-~s 
Zi =m YiandZ,= — wm Yi t+ 
(6.2.1) 0 0 
st 6141 73° 
it is easy to check that the probability density of Z; and Z, is 


(2r)7? 7"? exp (—3 tr (ZZ + ZZ). 


Here we shall be interested in a set of simultaneous confidence bounds on a 
certain class of arbitrary p.d. quadratic functions (see (6.2.9)) of the elements 
of the population matrix ui "(Dy 0)u: . For all pairs of nonnull and nonsto- 
chastic a,(p X 1) and a.(q X 1), consider the set of statements 


i : ly ry! 2 ‘7 ft ‘77 7! 2 

(6.2.2) (a,Z,Z202) /(@,Z:Z101)(a2Z2Z2a2) Sc. 

For a given Z,, Z: and c this is precisely equivalent to the statement 

ly wy! 2 ‘ry vy! ‘7 7! oe 

SUPa,,a (41Z1Z202) / (1712141) (Q2Z2Z 202) S 

or that 

(6.2.3) 

where 6, is the largest (and of course positive) characteristic root of 


(2.5.5 (22-)E) *- (el, ). 





SIMULTANEOUS CONFIDENCE INTERVAL ESTIMATION 535 


The relevant distribution on the null hypothesis, that is, when the true popu- 
lation matrix is 2, being known, let us determine c’ from the relation: P( is c 
true population matrix = 5) = 1 — a, and then write c’ as 6, or 94(p, q, "). 
Next note that, with 


D eji-e\p Dyin~e -0\p 


(6.2.4) é, = 0 — and 6 = 0 ile~e = 


Pp ,@¢-F 
we have from (6.2.1) 


ry rz! —lo a | rz v7! -1 . 4-1,/ ’ '—1, 
ZZ, = mu Sums; ZZ. = muy [—Simi 6: + Siu bo] 

(6.2.5) 

/—j,/ 


Zotz = [bmi Sumi ‘6: — bin Syou2 'b2 — boua Stour "61 + Sour 'Soous 
If we now put 
Dyin 9 
0 I 


, 
--1 


, , e om | , 
au = bh and a20242 = a2 


and tie up all relations from (6.2.2) to (6.2.5), we have for all nonnull a; and a, 
and all permissible 4; , uw, and 0’s the following set of simultaneous confidence 
interval statements (with a confidence coefficient 1 — a): 


a? y l—lealenl # ’ 2 
lbi(— Sir wa 6162 we + Si2)be) 


(6.2.6) 
denominator 


S 0.(p, g, 2), 


where the denominator is 


cals 


lo ’ ele le I—1e% o—1 / —la f —-l., -—-loag vy ’ } 
(bi Sibi) [be (u262 Omer Sumi 6162 we — ood2 Gyr Siz — (u2d2 Gur Siz)’ + S2o)de}. 
Note that 
Dy O 


(6.2.7) 66: = (Dyenaw 0) = (Dye 0) 


0 I 


1 


so that putting 
3.2.8) Bip Xqg) =m (Dye O)us 


we have, for this 8, the set of confidence statements 


: ik, (~San 4+ Sail” 

(6.2.9) Joann nae uB + Sibel < 6.(p, q, n). 
lo , we . CO y/ ~y = @ P, q 

(by Sir by) [b2(8’Si 8 — B’Si2 — S128 + Sxe)de] 





(6.2.9) gives a set of simultaneous confidence bounas on a class of arbitrary 
p.d. quadratic functions of the elements of the population matrix 8 such that 
the elements of the observed sample matrices Sy, Se. and Sy also enter into 
the coefficients of the class of arbitrary functions. It is interesting to observe 
that when p = q = 1, we may take wy = 1 = o2, be = wo = o, and Dye = p, 
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so that 8 = oip/e2 and check that (6.2.9) reduces to (6.1.4) for the regression 
coefficient. Indeed the 8 given by (6.2.8) can really be regarded as the regression 
of the set of p variates on the set of g variates or in other words, an appropriate 
generalization of bivariate regression coefficient. 

It is easy to check by inversion of (6.2.9) that for the associated hypothesis 
Hy:8 = 0, that is, Dye = 0, that is, Sy = 0, we have the critical region obtained 
in [1], namely 


(6.2.10) dp = 9a(p, g, n), 
when ¢, is the largest characteristic root of the matrix 
> aplvmi pep eel oe fe ee 
(Y1¥1) (Y1¥2)(Y2¥Y2) (Y2¥3), 
3 ‘ - a-lo iil elt 
that is, of the matrix Sr SwS2 Siz. 

The shortness of the confidence bounds (6.2.9) is tied up with the power of 
(6.2.10) which has lilready been discussed in [1]. By using a set of theorems 
closely analogous to that stated in Section 5.3, it is possible to draw out a chain 
of useful and interesting results from (6.2.9) much in the same way as (5.3.1) 
and (5.3.2) were drawn out of (5.1.5) and (5.2.4). This we reserve for a later 
paper. : 
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STATISTICAL SPECTRAL ANALYSIS OF TIME SERIES 
ARISING FROM STATIONARY STOCHASTIC 
PROCESSES! 


By U:tr GRENANDER AND Murray ROSENBLATT 


University of Chicago 
Summary. We consider time series which are realizations of a stochastic 
process. From the time series we construct various estimates of the spectral 
distribution function of the process (Section 3) and we study the sampling dis- 
tributions of some functionals of these estimates (Sections 4-7). We then obtain 
confidence regions for the spectral distribution function and various tests of 
hypotheses in the normal case (Sections 8-10). 


1. Introduction. Let us consider a real discrete parameter stochastic process’ 
that is, a sequence of stochastic variables 2,,t = ---, —1, 0, 1,---.? We 
introduce the quantities Ex, = m,, the mean value sequence, and the covari- 
ances ps,, = E(x, — m,)(x, — m,). The process yz = 2; — m, is said to be sta- 
tionary in the wide sense if p,.. = p... Then it follows from a theorem 
of Herglotz [10] that p, = [ e' dF(\) where F(A) is a bounded and nonde- 


® 


creasing function in (—7, 7). F(A) is called the spectral distribution function of 
the process as it can be said to describe the distribution of the spectral energy 
of the process (see Wold [22], p. 16). As z; is real, p; = p_,; and the distribution 
of the spectral energy is symmetric about zero. 

Since knowledge of the distribution of the spectral energy is equivalent to 
knowledge of the covariance sequence, it is a matter of convenience which to 
choose in analyzing the process. The reduced process y,; can be written as 


(1.1) Yt =| e'” dZ(x) 


[3] where Z(A) is an orthogonal process such that 
EZ(x) = 0 
E|Z() — Zu) |* = | FQ) — Fi) |. 
(For a discussion of stochastic integrals used in this paper see J. L. Doob, 
Stochastic Processes, John Wiley and Sons, 1953.) Hence dF (A) can be inter- 
preted as the variance of the stochastic amplitude dZ(A) corresponding to the 


A os ‘ . ° 
harmonic e’” in the Fourier expansion of y;. F(A) seems to be more directly 
related to the structure of the process than the covariance sequence. Later on 


(1.2) 
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we shall find other reasons for preferring to deal with the spectrum instead of 
the covariances. 
We can decompose F(A) into three parts, F(A) = Fa(\) + Fa(\) + F.(A), 


aX 


where F,(A) = | f() dl, FaQd) = a AF(\,), and F(A) is constant except 


on a set of Lebesgue measure zero. Here f(A) = F’(A), the spectral density, is a 
nonnegative integrable function and the X,’s are the points of discontinuity of 
F(\) with associated jumps AF (\,) > 0. We are going to deal with the absolutely 
continuous case, but will make a few remarks later on concerning possible dis- 
continuities of the spectral distribution function. 

A model of some practical interest is the following one. A linear filter F can 
be characterized by its output a, at time ¢ generated by a unit input at time 0. 
It is not necessary for our purpose to demand that a, = 0 for t < 0. Suppose 
we feed independent random impulses £, , identically distributed with mean 0 
and standard deviation 1, into the filter. Following Tukey we shall call a process 
of the type & “‘pure white noise’ as contrasted with a more general process 7; 
with En, = 0, En.n: = 6.1 (called “‘white noise’), but where the n’s are not nec- 
essarily independent. Note that in both cases one has a uniform spectrum, 
F’(\) = (2r)'. The resulting output at time ¢ is then 


20 
(1.3) Yi = 2, Gnbs- 

n=—w 
If >°*., a; < &, this sum converges with probability 1 (see Levy [15], p. 139 
142). If the input had been , we would only be able to state convergence in 
the mean square. In both cases the spectrum is absolutely continuous with 
spectral density (see Karhunen [12], p. 71) 


9 


. l = ind 
JA) = = > a,e ; 
aT | n=—2x 
More restrictive conditions will be imposed on the process later. A process of 
the type (1.3) is called a linear process. 
On the other hand, if a real stationary process y; has an absolutely continuous 


spectral distribution function F(A) = | f(D) dl, Doob (see Proceedings of the 


Berkeley Symposium on Mathematical Statistics and Probability, University of 
California Press, 1949, p. 327) has shown that 


(1.4) y, = >» Ga Qe~ns 


n=—s2s 


where we can choose a, = | e™’a/f(\) dX and En = 0, Ensne = 54. The 
n’s are elements of the real Hilbert space spanned by the y’s and an appropri- 
ately chosen set of real random variables (see [12], p. 42). They are then real 


and so are the a’s. In the present paper we shall deal only with the linear proc- 
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esses. It would be interesting to extend the domain in which our results are 
valid to a more general class of processes. 

A normally distributed process with an absolutely continuous spectrum is 
clearly linear as the y’s appearing in (1.4) are then automatically independent 
and identically distributed. 


2. Some methods that have been proposed for time series analysis. An im- 
portant group of statistical problems arise in the following way. We observe a 
sample a1, 22, °°: , ty (a time series) from the stochastic process x, and want 
to make inferential statements concerning the covariances, or equivalently the 
spectrum of z,. There has been a long history of work aimed at answering such 
problems. A considerable part of the literature has been devoted to testing the 
independence of the observations, usually in the case of normal time series (for 
a list of references see [14]). Other statisticians have studied what one could 
call finite parameter schemes, Two important schemes of this type are the auto- 
regressive scheme (2.1) and the moving average (2.2) 


P 
(2.1) De bn Yen = & 
Pp 
(2.2) Ye = 2 Rte~0 
n=l 


where the b’s and the §’s are constants. Specifying a priori an order p for the 
process, they have given estimates for the coefficients and have devised tests 
for various hypotheses (for a list of references see [14]). Whittle [21] has gone 
further and has given tests for discriminating between finite parameter scheres 
of different orders without specifying the coefficients a priori. 

Any linear process can be approximated as closely as is desired by a finite 
parameter scheme of sufficiently high order. In most practical situations, how- 
ever, it does not seem realistic to take the order of the scheme as some number 
p, usually small, given a priori. 

A great deal of the older literature has been devoted to the so-called periodo- 
gram analysis (see Kendall [13] for references). This was originally devised to 
deal with processes of the type 


m 


a, = >. (A, sin d,t + B, cosX,t) + & 
v=] 
where m, A,, B,, and \, are unknown. Clearly xz, has a spectrum with a dis- 
crete component. To estimate the frequencies \, the following statistic called 
the pertodogram 


1 N 
Ty(\) = = a > 26" 


veel 


we Goma) «(Some 


v= ve=l 
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has been used. A rationale for this method is that Jy(A) can be shown to diverge 
to infinity as N — © if \ coincides with some frequency \,. Hence if Jy(A) 
is large we suspect that d is one of the frequencies A, of the scheme. For a corre- 
sponding test of significance see Fisher [6]. 

Periodogram analysis is not immediately applicable to the case of an ab- 
solutely continuous spectrum. However, some work of the last few years has 
indicated that when it is properly modified, it is the most powerful method 
that has been found to work without very special assumptions concerning the 
covariance structure of the process. We would like to mention especially Bart- 
lett [1], [2], and Tukey [20]. 

A brief discussion of some of the results of this paper can be found in [9]. 


3. Some preliminary considerations. Consider a process 


where 1; is the component of the process with a continuous spectrum and the 
other term is the one with a discrete spectrum. If we observe a complete realiza- 
tion y,, —* < t < *, wecan specify the sample value of any Z, with proba- 
bility one. However, we cannot estimate E | Z, |? = AF(A,) consistently unless 
| Z,| is constant with probability one. We note in passing that the model of 
random phases in which Z, = A,e’*’, where the ¢,’s are independent and uni- 
formly distributed in (—7, 7) and the A,’s are constants, has this property and 
is not without interest (see Lévy [16], p. 114). 

Although the periodogram is a legitimate tool for estimating the frequencies 
\, of a discrete spectrum as has been remarked above, it cannot be used to esti- 
mate the spectral density of an absolutely continuous spectrum consistently 
[7]. Still the periodogram plays a fundamental role in our paper as our estimates 
are closely related to it. 

Let 1; be a linear process and let the £’s used in constructing the process have 
a fourth moment us. We set 


by — 3. 


Note that e is the fourth cumulant of the random variables ¢. The periodogram 
Ty(A) of the process y, then is 


Cc 1 N—1 
- = 7 + Vv > C, cos vd 


vam) 


y N= - ° . ae . 
where C, = SOYY yujis, . We are interested in statistics of the form 


(3.1) Py = [ Tye) dl 





SPECTRAL ANALYSIS IN TIME SERIES 541 


where ¢(/) is bounded, symmetric about zero, and has at most a finite number 
of discontinuities. Let 


(3.2) — [ fell) dl. 


THEOREM 1. Let by , & be defined as in (3.1), (3.2) and let Vy , V be defined 
analogously with weight function y. If the spectral density f(l) is continuous 


lim Ey, = 9, lim EWy = v 
N-x N-2 


and 


(3.3) lim WN cov (Or, 9) @ 08 + 4e / POVMe(l) dl. 


N—-2x 


Proor. By definition of &y 
(3.4) Ee, = [ EIs(o() dl 
But using (1.1) 
1 N : 2 1 | f ss 1 ve gite-8 oat j2 
j= | —__—__ ( 
Ts l 24N j + / | er - ] = S A—l aZ r) | 


and hence from (1.2) 
iN(Q&—1) |2 


EIx()) le oon 


2nN Ls | 1 — ef@-9 


dE | Z(d) |° 


, sin’ . (A —)1) 


which tends to f(l) as N — «. But the integrand in (3.4) is bounded by 
| @(A) | max, f(A) 
so that we have E@y —~ as N > ~. 
Let us now consider N cov (@y , Vx). We have 


v 


ee 1 Rten~Gtets 
N cov (Iv(Q), Ix(u)) = ay dX cov (yas, Yr yale vitae” 


4r?! a,8, 7,81 


~ 


Eyays YyvY¥s = 7. Aa-vy As—vy Ay—»5 As—v, 7 a £55 £5, 
P19 2.73.¥ goo 


x 7 
(us _ 3) 2 Aa—y A3—, Ay—, As—y + > Ag—y Ag, Ay, As—y 


—_— 2 Viiia=—oo 


~” oo 
+ 2. tectatinetia + 2, Gathatpetie 


Pim Piao 
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so that 
-» yy Bi~» T Pa—s Pp—~y T Pa—y Ppt: 


Y-» 


COV (Ya 3, Yy Ys) = 
The fourth moments of Y, can be shown to exist and the operations that have 
been carried out can be justified by a repeated application of Schwarz’ inequality 


and Fatou’s lemma. Hence 


N cov (@y, Vy) = 


= 


® Qa—y Ag—y Ay—, G_,y + Pa—i Ps—y + Pa—y P3-8 Pa—8 Wy 


yd . ’ . , 
| e “o(\) dd and y, is defined analogously. 


where ¢, = 
We first deal with S, . Note that 
S; = = a “3 “" "o(A)W(u) dr du 
4r?.\ ‘ 
Pom 3 r ™ Nv 2 N 
= 58 fT fF 1S ane | |S oe sone aa de 
STs — 


y=1 


n vr Vee—2O a=l 


First we consider the special case 


¢o(\) = @« '™ yu) = @ *™, 


The function 


Cn.m(A) 


is continuous. We have 


2 


l ® (a—y)A 
| ji .ae a= z. Casta eins 


») 
V=—20 


at 


ar 


min(NV,N+m) min(N .N+n) 1 
inX ‘ (é , fy. \ 1(a—y)X 
= > > = | Cn m\A)e dx. 
—w 


Sie C 
y=Max(llin) <7 


so that 


a=max(1,1+m) 


But then 


: y ‘a oe 
sin (V — n)d sin — ~— A 
—_—_—_—— | dn. 
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Using [5] we see that the second term is dominated by constant (log N)/(N) —- 0 
as N — ~. The first term tends to ec,,,(0) = €prpm. Now let us consider a 
finite trigonometric polynomial 


hQ, pw) 


in place of ¢(A)¥(A) in S; . Then 
Pp 


in BO o- 2. FS heakwewel | Mm oiwak& 


9 
Nx 4r’ n,m=—p 0 


Now given any $(A) and y¥(u), we can choose two finite trigonometric poly- 
nomials hy(A, uw), he(A, w) such that 


hi, w) S P(A)W(u) S he, w), [ / [ho(A, w) — Ai, w)] dA du < «. 


Then S;(o(A)W(u)) lies between S,(hi(A, w)), Si(he(A, «)) and hence 


lim ,lim S,(6(A)v(u)) 


r ¥ r 


lies between e | | hid. wf) f(u) dX dy and « | | hold, w)t(A)f(u) dd du. 


“0 ~0 0 “0 


On letting « — O we see that 


lim Sgawu)) =e | oOdsa) af vWflu) dy. 
“0 “0 


N—-2 

To evaluate the remainder S. + S; of cov (@y, Vx) we note that it would be 
the actual covariance if the process were normal. But in this case the limiting 
covariance has been evaluated in [7] when ¢ and y are even, although under 
the restriction that the spectral density has a continuous derivative. However, 
an argument similar to that just carried out above indicates that the result in 
{7| is valid for continuous f(A). This proves (3.3). 

One should note that as the periodogram is an even function, it is sufficient 


to consider estimates of the form @y = / Tx(\)6(\) dd corresponding to the 


weight function 3[@(A) + ¢(—A)]. Expression (3.3) then becomes 
lim N cov (@y, Vy) = ePW + 2x | Pf Deady dl. 
N-x -0 
The following theorem gives an estimate of the speed with which E@, con- 
verges to ®. 
THEOREM 2. Suppose that f() has a bounded derivative. Then 


Eb, = & + O(log N/N). 


Proor. By the proof of Theorem 1 


at . SiN’ = 


Ey —O = | are | —_———————_[f(l) — f(d)]@() dar dl. 
wot. my l 
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But we have uniformly f(/) — f(A) = O(l — X) so that 


—— O(| x |)V2(2" — |x|) dx 


according to [5]. 

We remark that if the function f(A) defined on the real line with period 2z 
has a bounded second derivative everywhere we get instead Hy — 6 = O(1/N). 

We now prove three elementary lemmas that are required in the develop- 
ment of the main results of the paper. These results have been proved under 
conditions that are probably far from necessary. But we believe these condi- 
tions give an idea of the practical domain of applicability of the results. 

In the remainder of the paper we shall assume that E | & |" < x. 


LemMa 1. Consider the covariances c, = a ak Enénsy - Then if v, un + 0 


AiN af uv 
(3.5) | Ec, Cr+ Cu Cu+; | Ss \ : 
\A2N° af R= PD. 


The reader should notice that C, refers to the y-process and c, to the &-process. 

Proor. It is clear that (3.5) is made up of terms of the type E|[3.. .. 
one of the indices n; is different from the rest this term vanishes. As each of the 
terms is bounded by E | é,|*, it is sufficient for our purpose to enumerate the 


nonvanishing terms. But we have restraints, say 
Ne =m + a, Nm = nz + dD, m4 =n + ¢, m=n+d, 


where we do not yet specify the integers a, b, c, d except that they should all 
be different from zero. We can then treat the eight variables in a completely 
symmetric way. Let us fix n, . As nz ¥ 1, n2 has to be equal to some other n,, 
say 73. 

Now we separate two cases. As ns * ne, it has to be equal to one of m , ne, 
n3, Ns or to one of n;, ns. Let us consider the first alternative. Then n; has to 
be equal to one of m,, m2, M3, M4, M5, Ne and whichever we choose, we have 


(3.6) n=m+a;, 


As1 sn; SN this gives us at most N possibilities. 

In the second alternative let us take ns = n;. If any of nm , n2, n3, M418 equal 
to any of ns, 6, M7, Ns We have again a system of restrictions of type (3.6) and 
hence again at most N possibilities so we can exclude this case. But no = nz ¥ n; 
and n, * nz = ns so that the only other way of getting a nonvanishing term is 
No = Ng, Ne = Ng, Which requires that a = b,c = d. If that is the case we clearly 
get at most N’ possibilities. The result now follows easily. 

Lemma 2. The distribution of the variables (co — N)/WN,e:1/WVN, +++, cx/WN 
tends to the distribution of k + 1 normal and independent variables with mean 
zero and variances e + 2,1, 1,°--, 1. 
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Proor. Introduce 


2; = W(f} — 1) + bestia +e: 


so that 


] ’ 
ro lés(eo ~~ N) + tc a 


where 


N 
7. . bee. y>0. 
jaeN—v+l1 


It is evident that (c, — ¢)/(WN) — 0 in probability as N — «. But z; isa 
stationary (k + 1)-dependent sequence of random variables. Applying a theorem 
of Hoeffding and Robbins [11] we see that Uy is asymptotically normal with 
mean zero and variance &(e + 2) + >-i ¢;. Hence 


a- N E 


Ee’" = Eexp<itp => +i dts) 
"VN rm’ WN 
which proves the lemma. 
Lemma 3. If y, is a normal process with a positive spectral density having an 
integrable second derivative, then a, can be chosen so that a, = O(1 y”), 


aT 


Proor. Choose a, so that a, = | ey f(\) dX and integrate by parts 
twice. The result follows immediately. 
4. Treatment of pure white noise. 


THeEoreM 3. Consider the empirical spectral distribution function 


X 


Fx(d) = | Iv,e(l) dl 


“0 


where 


9 


1 i ead 
ng(A) = 5— » ve |: 


2xN vel i 
Then the limiting probability distribution of 
Ix * A 
max VN! Fy(d) — > 
O<A< 7 | = 


as N — « is the same as that of 


max | ¢(A) | 


os\s* 
where ¢(A) is a normally distributed process with mean zero and covariance 


‘ r ] s 
Eg()e(u) = — - - 5. min (A, yw). 


4 r T 
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Proor. We have 


r te, sin vy 
\/N | Ft20) — *| => 7 } 3 ma 


2m | “1 


= Sy~(A) + rv (A) 


/ 


where sy,.(AX) consists of the first term and the i — 1 first summands of the 
sum. We shall show that with probability close to one, for sufficiently large /, 
ryx(X) | is small uniformly in N, X. 
Let us consider 


f eur 


> c, sin vv - . c, e” 


v=—_m Vv N v vam /N v 
where 0 < m <1 < N. But 


ivd |2 i—m_ 


a Cre | 
hee | Ah ee 
von Vv N v j=0 | v—m N vip _ ) 


tJ 


To get a bound for this sum we consider 


a am, Ec, C.4d Ct 


(4.1) Ey ot ~ 2% or 


=, Nv(v + j) yuom N vy + j)u(u + 9) 
We know by Lemma | that 


A; N° if = 


| Ec, Cr4-5 Cu Cu+j | , ro 
Ay N lf vy #F KL. 
Thus the expression in (4.1) is bounded (use the Schwarz inequality) by 
me l A, = 1 <i A; 
As = - +— > - ts — ae, 
F x vivtj) °° N tv + j)ulu +7) = 2 v (vy + 3) 


: ) opt rp : : 2 . ‘ 
Now we choose m = 2” and! = 2’*”, Then again using the Schwarz inequality 
we get 


“—~ € #89 vA 
Emax 7. = = 
ects | scar SN v 


Let k = 2”. Then with probability greater than 1 — A;/2’ 


rT? 


; . cy sin vv 
T, = max 3 = 
A x Vume2P Vv N v 


Consequently 
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with probability greater than 1 — A¢/2”'*. Choosing n sufficiently large | ry.(A) | 
is uniformly small. 
We want to find the limiting “ of 


r ( max /N lFt¢0) — » Sa). 
0 Qn 


<A\<f 


But from what we have just shown, we see that for any ¢, 6 > 0 


P (max |syi.(\)|<a-—-e) -6S P( mi 


O<A<f 


r 
max VN | Fr &Q) — oI a) 


0< 


<= P( max [syn ()|<ate +8 


O<A\<r 
for & sufficiently large, uniformly in N > N(e, 6). But 


—wN le & sinvA 
Syn (A) A+- iF 

™ os VN _- VN y 
and we have shown i in Lemma 2 that the joint distribution of (¢ — N)/-VWN, 
a/VN, /\/N (k fixed) converges to the distribution of k een 
normal nth with common mean zero and variances e + 2, 1, 1,---, 1. 
Consider the related process 
sin vA 


k 
si)= 2 Vet2,+ D4 
a7 1 rT 


Vv 
where the y’s are N(0, 1) and independent. It is easily seen that 


lim P( max lswe(A)| S a) = P( max |s&(A)| Sa 


N=> 0 ~ O0<\<r 


as the relevant point set in (k + 1)-space is closed [18]. But on letting k tend 
to infinity s,(A) converges uniformly to 


ra) = 22 Ver? n+ pee 


v 


(see Paley-Wiener [17], p. 148-151). But then 


. : a \ 
P( max {|f(A\)| <a—e) —-é8< P ( max VN | FS3(A) — = «) 


Usa A<r | aT | 


< P( max |¢(A)| <at+e6 +6 


O<a 


and if N — ~ we can let 6, e — 0. Since the distribution function of MAXy <) < 
| ¢(A) | is continuous (see Section 7), this completes the proof. 


BN 
5. Reduction to pure white noise. Let G(A) = | fl dl. 
0 


THeEoreM 4. Let f(A) be a nonnegative absolutely continuous function. Then 


N—00 \OsvAsz 


. : 
lim P< max JN | | SW {t2rIn — 1} dl| S a> = P{ max | n(d)| < a} 
9 1 Osdsr 
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where n(\) ts the normal process with, En(\) = 0, En(d)n(u) = eF(A)F(u) 
+ 2x G (min (A, p)). 
Proor. Integrating by parts we have 


ov 


FD {2rIy2() — 1} dl = 2xf(r) | Ft) =F 2] 
r 
-- | 2rf’ (1) | F200 = 


“0 2Qr 


Given « > 0, by Theorem 3 we know that 


! 


beg r 
max | VN G 2(\) — ~) — Syx(A)| <e 
O<A<r / 


~ 


with probability 1 — ¢ uniformly in N > N(e) for k sufficiently large. But then 


Se i ; ; 
max f(A) | VN (Fs. (A) — ») — Ons 0) | 


0<A<r 


orn ae ee l 
— f’(l) RaGro - i) — svald | dl < Bie 


“0 _ 


with probability 1 — «. However, 


( “a | ) 
P< max | f(\)svx(d) — | f’Osv.() dl) S a> 
locke “ | 


d ,. « 
fr)s(A) — | f’Dsx(l) dl| < @) 


® “0 


f 
— P< max 
\0 As 


as N — o as the relevant set is again closed in (k + 1)-space. We know that 
for any « > 0 

max | (A) — f(A) | < « 

O<A<fr 
with probability 1 — ¢ for sufficiently large k. Let 


ny 


nd) = fer) — | fOr dl. 


In summary 


P< max |n(\)|<<a—e>’—6 < P4 max VN | FD {2rIye(l) —1fdl <a> 
\O<\sr \0 = =F 1 0 / 


< Py max | n(\)| <atep + 6 

\Os\sr ) 

and on letting N — © so that e, 6 — 0 we obtain the desired result. Note that 

the event max) ~<,-, | 7(A)| S @ has a well defined probability since (A) is a 
sA\sr | y 

process with continuous sample functions with probability one. 
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Tueorem 5. Let a, = O(v") where B < — 34. Then given any e > 0 
/N max | [IvQ) — 2nf(YIv iM) dl <e 
O<A<fr “0 


with probability 1 — efor N > N,. 
Proor. We have 


d 


(5.1) 2nVN | [s() — 2nf(Iye())] dl = 


«2 


- Oy Os Uys 


1 
I VN XZ 


where 


i(n—m)r_ 1 


i(n—m) x 
e — ] e 
Sony Fa (_—— = =. | a, a ee 
mal i(n — m) nal+7.N+r i(n — m) 
m-=-1+8,N+8 
The coefficients of the terms when n = m should be interpreted as \. We note 
° (N) ° . 
that there may be a lattice rectangle R;,’ of points (n, m) common to both 
N e e ° 
sums. Let us call the complement of R;\’ with respect to the set consisting of 
. ° ° . N . 
all the lattice points in both summations Cc . Then we have 


E max |d,,/| 


Lohee nmeC. 


< Do ny gn — m) 
38 
where 


es 
— if <0 


x 


r ifx=0. 
It is easily verified that 


+ 2N log Nif|r| > Nor|s|>WN 
ux) gin — m) S&S 

n,me¢ . } yT . 
4(s + r) log N otherwise. 
The expression in (5.1) is then bounded by 


9 


x ? 8 1 é 
V/N 2» Xu sail _ log . - V/N zu ad. — ie log " 


/ L 


, 8 log N 
<(X ja |)(AVN log O laej + a= YE rial). 
v4 AN VN ifzy 
Under the assumption made above, this expression tends to zero as N — « 
which proves the theorem. 

A 


6. Treatment of the general case. Let (A) = | Ty(U) dl. 


0 
TueoreM 6. Let 
1. f(A) be absolutely continuous 
2. a, = Ov"), B < —35. 
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lim P{ max VN! Fes) — F(A)! S a} = Pf max | nf) 


V--x \ 


where n(X) is the process defined in Section 5. 

The proof follows immediately from Theorems 3, 4 and 5. An important 
special case is contained in the following corollary. 

Coro.tiary 1. When y; is a normal process with a positive spectral density with 
an integrable second derivative, Theorem 6 reduces to the following statement: 


- = +* ’ ' 
lim P} max YN | Fxy(\) — FO) | S a} 
Nx <A\<f 


(—1)‘[®((2k + l)a/x) — &((2k — le ‘x)| = Ala/z) 


k=—o 


where &(u) is the normal distribution function and x =< 2rG(r). 
Proor. Lemma 3 implies that the assumptions of Theorem 6 are-satisfied. 
As e = 0 the process n(A) reduces to the Wiener process with the following 
changed scale of time: ¢ = 2rG(\), 0 S X S w. The reader may note that (6.1) 
is the probability that a particle in Brownian motion on the line is not absorbed 
by the barriers a, —a@ in the time interval 0, 27G(7), given that the particle 


starts from 0 at time 0 (see Sommerfeld [19], pp. 74-79). 


7. The limiting distribution in the case of pure white noise. When the process 
y is pure white noise (so that f(A) = constant, say 1) the limiting probability 
we are interested in is given by 


Gok) >{ max | ¢(A) 
O<A<rF 


where ¢(A) is the normal process with 
Eg(A) = 0 
Eg(A)e(u) = edAw + 22 min (A, x). 


Now (7.1) has been explicitly evaluated when e = 0 (see 6.1). However, the 
limiting distribution can still be evaluated when e ¥ 0. 
THEOREM 7. 


a 


P} max (Xd) | Saj= 2. On 4)" k2a2/r2) (1—(y/2 
O<Asr 
| Vv ye ( ahi = — a( ¥ i 
\ T 7 \ Tv 


Proor. Let 7 = \/r. Let X be a normal random variable with mean zero and 
variance one. Let X(7) be a normal process with 


6 x 


where yY = ¢ + ee 


EX(r) = 0 


EX (1)X(7’) min (7, 7’) — 77’, 
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551 
that is, X(7) is the Wiener process conditioned so that X(1) 
are independent. Consider the process u(r) 


VyarX + 20V/2X(r). The proc- 
>} max 


0. X and X(r) 
ess u(r) has the same probability distribution as ¢(\) so that 
0 
But 


- ’ 
4 D < oo; = "4 


{ max | u(r) | < 

© 0<7r<-1 
P{ max | u(r) | S< a} 

(7.2) 


0O<r<—1 
om — a 
_ f° (nave 
—_ Y 


/ 
< X(r) < a <a RTI 
V2 - = 


;0<s 


} 
a}. 


= 
V2 
where ¢(x) is the normal frequency function. Let 


t 

ry =«+0x(-+), O<t< -. 
i+] a 

Then Y(t) is the Wiener process (see [4]),0 < t < 

(7.3) 


Pi{-a— (a+ bit Ss y(t) Sa 
a/ry 2 


| V ¥e | 
(7.3) as 


A 
where a 


x2. We have then to consider 
(a—b)t;OsSt< 


L i 
b= Vyx/V 2. The integrand in (7.2) is clearly zero unless 
< a/x. But supposing this inequality to hold, Doob [4] has evaluated 


x 
=2>) (-1)""e 
n=l 
Thus (7.1) is equal to 


& =. _ af Vz) - 220 (= 
1 J ( Y ~ 


1 —(n%a2/e®) [ VY 
TN 1 Ln -_ 


V ya 
g(x) cosh n — £a2 
a/ry ¥ Tr 

zx 
a an 
2 — @i{ - 
TV 4 


Bd > (— a 
Y TV ¥ 
ih (sv + 


ni r2)—(ya2/2¥2)] 
e 
n=l 
n * a 
) — @ (= ee 
T TV 4 


, —nvy ya 
y * P 


<“- ny yo a )s 


a Ty 4 


3 
cosh 2nba. 


r / ag 
] ya l 
[6+ 6-9)h 
 \ 7 r Y 
This expression parallels (6.1) and will be used in a later publication in which 
we will study some applications of the theory. 


8. Statistical applications. The corollary of Theorem 6 lends itself to impor- 


tant statistical applications. First, however, we will have to estimate G(x) which 
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is in general unknown. We note that G(x) = (42) im p». Of course the C,/N 
are consistent estimates of p, . It is natural to consider an estimate of the form 
(8.1). One reason for this particular choice is that in practice one will try to 
avoid the cumbersome calculation of all the lagged product sums C, . 

Lemma 4. If a, = O(v"), 8 < —1, the statistic 


1 [KN@] 


VE / Y- 3 
(8.1) =» —,([C+2 > Cl, 0<a< 
toN rank 
is a consistent estimate of G(7). 
PROOF. 


(8.2) (9) = 5 1s +2), C+: 


1 [KN@] 


- 


‘ To 
2rN * voam+l 


But we see using (3.5) that the last sum is less than or equal to 


[ po} N oo 
j ' 
z=. » {| Qj, Bj4v—9, Akay Ak+v-f2 | + | 8, 2j4+v-82 Ak-8, Ak+»—8 | 
veem+1 j,kel 8) 8y.e—o0 


+ | jg, Oj4y-g, Akg, Arya, |} (4 + ws) (Dor + Doe + Dus) (4 + a) 


Let the r, be understood to be the covariances of the process })7_- | dn—» | & 
r . ‘ \—1 ' vA 12 . . . ° 

The spectral density (2) | > | a, | e’” |’ of this process is continuous and hence 
quadratically integrable, so that }>*, r; < %. But then 


[KN] 0 


lal= DS Nessun > 
vaem+1 v¥=m+1 
so that | 0; |/N* < € for m sufficiently large uniformly in N. Also we get 


] a FATS oy x , 
ao r, = KN ! tH r, —O 


N 2 


[KN°] ON [KNe 


cA 22 ig Zz z ris ye 
Mec vam+l j k=l i ean 


N? 


as N — o. The third sum is handled in almost the same way but using Schwarz’ 
inequality. We now choose m so large that the last sum in (8.2) is less than 6 
with probability larger than 1 — 6. The term in brackets in (8.2) consists of a 
fixed number of terms each of which converges in probability to the correspond- 
ing term in the expression for G(r). This proves the lemma. 

The corollary of Theorem 6 shows that in the normal domain the asymptotic 
distribution of max, WN | F¥(A) — F(A) | depends only on one parameter G(z). 
This together with Lemma 4 enables us te construct confidence bands for F(A). 

TuHeoremM 8. Suppose that y, is a normal process and that its spectral density is 
positive and has an integrable second derivative. Then 


* 


*(ar) ) 1 
C <= F(A) S Fx) tay 


2rG 


N 


/2rG* (x) 


N 


(8.3) F(A) — a / 


holds with a probability tending to A(a) as N tends to infinity. 

The proof follows immediately from Lemma 4 and Corollary 1 as A(q@) is 
continuous. 

Clearly Theorem 8 also gives a test of significance for the simple hypothesis 
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of a completely specified spectrum. A more interesting case, however, is the 
following. 

THEOREM 9. From a stochastic process satisfying Theorem 8 we observe two 
independent time series of length N, and Nz. Compute the corresponding estimates 
F(A), Fx.(A), Gi (ar), G2 (9). Then, assuming that N,/N2— ¢ > 0 and putting 

= 2N.\N2/(Ni + N2), 
max VN | Fx,(\) — Frx,(a) | 
(8.4) ms a ——— ——— 

V@@tGG@ 
holds with a probability tending to 1 — A(a) as N tends to infinity under the 
hypothesis F\(A) = F(X). 

Theorems 8 and 9 are somewhat restricted in their applicability as they stand, 
since in many contexts m, = Ez, is not identically zero. We shall consider the 
case 

me= her + dogi’ +--+ + dpe”, 


where ¢:', ¢:, +: , ¢:” are given sequences and the regression coefficients d, 
are unknown. To avoid unnecessary complications we will confine ourselves to 
the case p = 2, which illustrates the general situation. We have to introduce 
the following condition which prevents the two regression variables ¢;'’ and ¢/ 
from becoming linearly dependent in the limit 


(8.5) 


We will use the least square estimatés dy , dt of d, , dz as they have been shown 
to be asymptotically efficient [8]. 

TueoreM 10. Under the conditions of Theorems 8 and 9 and (8.5), formulas 
(8.3) and (8.4) remain valid if F(A) is computed using #4 — dig} -- ane: in 
place of y:. 

Proor. It is sufficient to prove that 


» 9 


2 (x1, -— dig,’ — dig, je” dl 


1 X \ oes 
~ aK ! 2 Yyre dl 


tends to zero in probability as NV —+ «. The expression inside the absolute value 
sign is 

l - , yk 4 sk \.@ * 2 (1) (1 

au z. [—2(d; —d¢, y, — 2(d: — dev’ y, + (di — die, eu 
2rN ¥ =l 

, a 2 > » 9 * \/ 7* (1) (2) sin(y — w)d 

(ds —_ d») Gv Cu + 2(d; — d,)(ds a dz)yy Su ] o eee 

r= - 


«a4 Te 4 Se 
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where the same convention as before is used in interpreting the terms with 
v = yu. The least square estimates dj , d? are unbiased linear estimates with 
coefficients depending only upon the g:’s and ¢s””’s. A simple argument using 
the fact that the spectral density is bounded shows that 

(2)2 


fon De Ss , A 
(8.6) vard; sS max f(\) = 


. . wa: @: (1) (2)\2 
s\st Le L# ~Gee 


27 max f(A) 
O<A<4r 
1—R = : 


>» 
’ 


1 


Now 


] f N=l1 N—p 


n/N max | 2n| Vy id — | 5 <1 2, sel, 


O<sA<r p=! DP | v= 


B 1 


N 2 N | 
t+aridi—adl|-| Dyes?! +)dF-ad| Dd ian » Yrerep |? 
v=l—p 


p=—N+1 P 
But we have 


N—p 2 NV 
, - 1) ‘ “/ (1 2 
(87) ab> nel | < 2r max f(r) Do! 


veel vesl 


r _2 . ey: . * 1s 
We know from (8.6) that with as large a probability as desired | dj — d,| is 


' 


less than k/* Lie *. Then it follows from Schwarz’ inequality and (8.7) 


that with large probability 


aW/N max | >| < K’ log N/V/N 0. 


O<A<r 


>=: can be handled in the same way. Now 


hl 2 A N 
2rV/N max 2 = di ~ a) |  ™ ese” 
Q<A\<3r Vv N “0 1 
and the expectation of this tends to zero as N™ J» and >-; can be treated in a 
similar manner which proves the theorem. 

An important special case is ¢, = 1, which corresponds to a constant unknown 
mean value of the process. Another situation of some interest arises when the 
spectrum of the process has a discrete component with frequencies A; ,A2, °*- , Ap. 
There we take the ¢,’s as trigonometric functions with these frequencies. 


9. Alternative estimates of the spectral distribution function. It is clear that 
Theorems 6, and 8-10 are still valid if the estimate F¥(A) is replaced by a trun- 
cated estimate 
"or sin (hy + 3)(U — uw) 


Co) 
Fy) = = / (l) —————— did 
Notr\ 2r7N_ 2r Jo J sin (1 — p/2) ” 
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where hy — ~ as N — , as they can be proved in exactly the same manner 
since the weighting factors are 1 from v = 0 up to »y = hy and 0 from that point 
on. Note that the estimate F'¥,;,-(A), in general, is not nondecreasing with prob- 
ability 1 which may at times be an unpleasant feature. We can choose hy = [kN"| 
as in Lemma 4. This reduces the computational work considerably as one then 
only needs to compute the C,’s for »v S [kN*]. 


We shall now consider a class of estimates of the spectral distribution func- 


tion with nondecreasing weight functions. Let Wy(A) = / ws(l) dl where 


ar 


wy(l) = Oin (—7, 7), | wy(l) dl = 1 and 


{0 if 4 <0 
lim Wy(A) = < 


FQ,Wy) = | IvOWe —D dl 


7 


FQ, Ws) = | fOWs — D dl. 


“0 
We prove the following theorem. 
THroremM 11. Under the conditions of Theorem 6 
lim P§ max VN! Fxt\, Wy) — F(A, We) | S a} = P{§ max | nf) 


Nx O<A<. OsAsr 


Proor. On integrating by parts we have 


VNIFSOA, Ws) — FO, Wy) = VN | Ue — £0] dW — ») 
(9.1) i 


al 


+ [ VN | [Tv(u) — f(u)] duws(d — JD) di. 


By Theorem 5 we can replace (9.1) by 


J/N [ fDl2rTy.(0 — 1] dlWx(\ — x) 


QQ?) 
(Je 7 pl 
/ 


/N | [2rTy,e(u) — 1 flu) duws(d — D) dl 
“0 “0 
committing an error of at most « > 0 uniformly in \ with probability 1 — «. 
However, on integrating by parts twice we obtain 

* i 

N 


VN2alFe (x) — Uf(r)We(\ — x) — | WN2e Ez - 4 


) 
+f 


“DW 90 — 1)) dl. 


( 





ULF GRENANDER AND MURRAY ROSENBLATT 


"4 1 
= Sx m(A) <e€ 


VN max |Fy(A) — — —- 
0<\< | or Vv \ 


® 


with probability 1 — ¢ where m is a large but fixed number. We can then replace 
(9.3) by 


.7 


| ° i W (r 1) , (1 Il Co N 
\ = SN im = - = 
-0 J : ) - aoe } . ory N I 


F 


SOWA — YD dl 


v Z | Si cos If(DWy(A — D dl 


with an error of at most e uniformly in \. But then reasoning as in Theorem 4 
we get 


r 


lim P) max || s()Wx(A — Usxn(l) dl| S a} 


Nx \OsvA<r | 0 i 


r 
= P, max | f{Os,() dl) < 
O<A< fr | of j 


making use of the fact that 


es 


cos vIf(l)Wx(d — 1) dl — | cos vIf(l) dl 


/0 


\ 


WD d+] |W) -—1\ds—0 
® “0 ) 


<= max f(A) ‘| 
( A<r L = 


as N > o. But 


max | | f(Ds,(D) dl — n(\), <« 
O<rX\<r j 


with large probability if m is sufficiently large and the theorem follows imme- 
diately. 

We are usually more interested in estimating F(A) than F(A, Wy). The follow- 
ing corollary enables us to do this. 

Corotiary 2. Theorems 6, 8-10 remain valid when Fx(d) is replaced by 
Fx(A, Wy) af 


7 


(9.4) [ wre a+ | - wyadi ar = ov. 


0 
Proor. The proof follows immediately as 
max VN | F(A) — F(A, Wy) | 
0<A<7 


< VN max f(A) i] W(X) dX + | [1 — We(d)] dv} = (1) 
O<A< fF =e ] 


“0 
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It is clear from the comment above that the time series analyst has a large 
class of estimates of the spectral distribution function at his disposal. At present 
we are not able to claim that certain estimates are preferable to others. We 
hope to investigate such problems in a later paper. 

It is worthwhile noting that the Fejér kernel 


. shy 
1 sin 3 4 


ae = tall 
2rh v ™ ) 


in2 


sin 


satisfies (9.4) if log hy/hy = o(N~), so that we can choose the truncation point 
hy as hy = |kN“], 3 < a < 1. Note that the estimate corresponding to this 
kernel is 


Co sin v\ \ 
Fx(A, Wx) = =— c: ti ~ 2). 
, Qr a i¢ h N 
This estimate of the spectral distribution function is closely related to an esti- 
mate of the spectral density given by Bartlett [1]. It is nondecreasing and does 
not require the computation of all the C,’s. 


10. Alternative statistics and the corresponding limit theorems. In some in- 
stances one might prefer to consider statistics other than maxo<y<+|F'x(A) — F(A)| 
in analyzing time series. We shall therefore consider some alternative statistics 
in this section. 

Consider the linear space consisting of continuous functions ¢ = c(A) on 
0 =) S x with the norm | ¢ |) = suppen<r | c(A) |. Consider a functional ¢(c) 
uniformly continuous in this topology. 

THEOREM 12. Under the conditions of Theorem 6 we have 


lim Pio(VN[FxQ) — F(A)]) sa,DSA Sri = >! o(n(A)) <= a,O0 


Nx 


Proor. Writing 
VN (FRA) — FQ)] = syx() + read) 


as before we note that | ry,.(A) || < € if k is chosen sufficiently large. Hence we 
commit only a small error by considering instead the probability of the event 


i¢(svs(A)) Sa,0 SA Sz} 
which is a closed set in (k + 1)-space. This probability converges to 


Pig(s(A)) S$ a,0 SAS} 


as N — o. But we can choose k so large that || s,(A) — n(A) || < 6 with prob- 
ability 1 — 6. This together with the uniform continuity of ¢(c) proves the 
theorem. 
As an example we can choose 
P 
g(c) =<] |e(d) |? dala) 


1.0 





55 ULF GRENANDER AND MURRAY ROSENBLATT 


where »(A) is bounded and nondecreasing. This will give a statistic of the von 
Mises type. 


11. Acknowledgement. We are indebted to J. L. Doob who suggested the 
problem that led to the investigation resulting in this paper. 
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A STOCHASTIC MODEL WITH APPLICATIONS TO LEARNING! 
Rosert R. BusH AND FREDERICK MOSTELLER 
Harvard University 


Summary. A stochastic model designed for analyzing data with changing 
probabilities is presented. On each of a series of trials one of two alternatives 
occurs and the probabilities of occurrence are changed from time to time by 
events. Corresponding to each class of events is an operator which represents a 
linear transformation on the probabilities of the two alternatives. Cases of 
fixed event probabilities and of changing event probabilities are considered. 
Recurrence formulas for moments of the resulting distributions of probabilities 
are provided. These formulas are often tedious to apply, but for the first and 
second moments several bounds are provided; these bounds are relatively easy 
to compute. 

The problem of estimating the parameters of the model is discussed. No general 
solution is obtained but simplifying assumptions lead to interesting special cases 
for which detailed procedures of parameter estimation are presented. One such 
special case arises when there are two event operators which commute, implying 
that the operators have equal limit points or that one operator is the identity 
operator. The method of maximum likelihood is applied to this case. Anot'ier 
special case, which arises when the slope parameters of the two operators are 
equal, is discussed in Section 8. 

Applications of the model and estimation procedures to certain kinds of data 
on animal and human learning are described. The examples given are experiments 
on verbal learning, the avoidance training of dogs, the reward training of rats in 
a simple T-maze, and the behavior of human subjects in a two-choice situation. 


1. Basic concepts and definitions. During the past three years, the authors 
have been developing a mathematical model for describing a variety of experi- 
ments on animal and human learning [2], [3]. This model is closely related to the 
one developed by Estes [5] and to the more recent work of Miller and McGill 
[9]. These models have led quite naturally into a study of a class of stochastic 
processes, which may be viewed as Markov chains with an infinite number of 
states. In applying the model to the analysis of experimental data, a number of 
problems in statistical estimation have arisen. In this paper, therefore, we present 
a summary of the main mathematical results obtained and a discussion of some 
estimation procedures we have found useful. 
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A learning process, as the term is used here, involves systematic changes in 
behavior; one type of behavior may become more frequent and another type 
of behavior may become less so. We shall describe this learning process in a situ- 
ation where a choice of a number of given alternatives occurs periodically. Each 
occasion on which there is an opportunity for making a choice will be called a 
trial. Typically, one observes that a particular alternative occurs more and more 
frequently—this we call learning—auntil the system stabilizes so that no more 
average changes in behavior occur—this we call the completion of learning. In 
later sections we discuss applications of our model to problems in learning, but 
we will describe the basic structure of the model in somewhat more general 
terms. 

We consider a set of mutuallyl excusive and exhaustive alternatives, A; , 
Az,-+:, A,. On each trial one and only one of these alternatives will occur. 
On each trial we define a set of r probabilities, p: , p., --- , pr , corresponding to 
the r alternatives. The probability p; is then the probability that the ith al- 
ternative will occur on the trial in question. We assume that all the available 
information about what alternative occurs on that trial is given by the set of r 
probabilities. The alternatives which occur on trials previous to trial n, for ex- 
ample, do not influence the outcome of trial n except insofar as they may have 
determined the probabilities for that trial. On each trial the probabilities must 
sum to unity since we have taken the r alternatives to be mutually exclusive and 
exhaustive. 

The set of probabilities p; are altered from time to time by certain events, 
E,, E.,--- , E,. Corresponding to each event there is a mathematical operator 
T; (j = 1, 2,--- , t) which operates on the set of r probabilities whenever event 
E; occurs. We next give particular representations of these operators. 


2. The event operators. It is explicitly assumed that the operators which 
correspond to the ¢ events are linear. Thus we may represent the set of r prob- 
abilities by a column vector and each operator 7; by an r X r matrix. In the 
remainder of this paper we will discuss only the case of two alternatives A; and 
A» and so we need only a single probability variable, p, the probability associ- 
ated with A, , because the probabilities of the two alternatives always sum to 
unity. Thus we may dispense with the matrix machinery and write for the upper 
element of the transformed probability vector, 


(1) Q;p = a; + a;p, 


where the a; and a; are parameters which are restricted only by the requirement 
that the probabilities must always be in the closed interval from zero to unity. 
This means that 0 S a; S 1 and —a; S a; S 1 — a;. The operators Q; are not 
(homogeneous) linear operators, though derived from linear matrix operators. 

These operators may be applied to an operand p more than once. When Q; 
is applied twice we obtain 


(2) Qip = a; + a(Qip) = aj(1 + a) + ajp. 
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When Q; is applied to p a total of n times, it may easily be shown that 
(3) Qip =r; — (A; — pa}, 

where 

(4) A; = a;/(1 — a,j). 


When the magnitude of a; is less than unity, the term in a} of equation (3) 
approaches zero as n gets large, and so in this case, \; is the asymptotic value of 
the operation Q}p. 

The t operators Q; may be applied to p in various orders, corresponding to the 
orders of occurrence of the ¢ events. If we know in advance the particular sequence 
of events, it is a simple matter to compute the successive values of probability. 
Our main interest, however, is in problems in which the precise sequence of events 
is unknown. For any sequence of the ¢ events and for any initial value of prob- 
ability, it may be shown that the probability will ultimately lie between two 
limits; this we have called the trapping theorem. Corresponding to the ¢ oper- 
ators will be a set of ¢ limits, \; , given by equations (4). The trapping theorem 
states that the asymptotic value of probability from any sequence will lie in the 
interval including min (A;) to max (A;), as the least and largest cluster points, 
respectively, provided only that 0 S a; < 1 for all 7. Our proof of this theorem is 
elementary but rather lengthy and will not be given here. The point is that if the 
starting value of p is between the limits, the sequence of p’s will forever remain 
there. If the starting value of p is outside the limits, the sequence will ultimately 
be trapped inside the interval or else tend to one of the limits monotonically. 


3. Fixed event probabilities. In this section we will describe the process when 
we do not know the precise sequence of events but know only the probabilities 
x, that when an event occurs it will be event £; . The set of t event probabilities 
7; are constant and sum to unity. After n event occurrences there will be at 
most ¢” possible sequences and hence at most t” possible values of p. The prob- 
abilities of occurrences of these sequences will depend upon the z;, of course. 
We are interested in the properties of the distribution of values of p for all n. 
We may order the t” possible values of p and label them h = 1, 2, --- , t”. We 
denote the hth value after n events by p,,, and the probability that this value 
will obtain by P,,,,. The mean of the distribution after n events will be denoted 
by V;,, and it is defined by 


in 
(5) vi. = ee is ak bins (n = 0,1,--- 
hl 
Now after the (n + 1)st event each of the ¢” values of p will split into ¢ new 
values of p. In particular, the hth value will split into ¢ new values Q;p,,,, with 
corresponding weights 7;P),,,. Thus, the mean will be given by 


tm ¢ 
(6) Vins = 2, Dy Pan Qj Daun: 


hel, folk 
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Using equations (1) and (5) and the fact that the sum of the weights P,,, over 
all A is unity, it follows that 

(7) Ving =a i aVin ’ 

where the average or actuarial values of the parameters are given by 


t 
(8) a= > x54; 


j=l 


t 
a= : 8 Tj Qj. 


j=l 


Equation (7) is a well known linear difference equation and has the solution 


1—@ l—@ 


(10) ica: ei | —_— V0 | a”. 

a 
The correspondence between equation (10) and equation (3) is clear if @/(1 —&) 
is regarded as \. (The expected operator, discussed in the next section, will yield 
the correct value of V;.,4: from V;,,.) Higher raw moments of the distribution 
may be obtained in a similar way. The kth raw moment after n events is 


(11) Ven = D, Pan (Pa,0)'- 
h=l 


After the (n + 1)st event the kth raw moment is 


(12) V re le z 7 Tj Pi. n(Q; Dh.n)'s 


hel j=l 


After inserting the expressions for Qp,,, from equations (1) into expression (12) 
and expanding the resulting expressions by the binomial theorem, we obtain 


j=l i=0 
(ds) 

where the \, ) are binomial coefficients, and where Vo,, = 1 forall n. Note that 
the kth raw moment after n + 1 events is given in terms of all the raw moments 
up through the Ath after n events. With the expressions (13) we may compute as 
many moments of the distributions as we choose. 

One may also compute the moments of the distribution of p,,, from a moment 
generating function ¢,(@) defined by 

n 


(14) $,(0) = > Ph Py 


h=l 


It is easy to show that 


t 
(15) onai(O) = >, we"1d,(Ocx;). 


jail 
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4. Identification of alternatives and events. In the preceding sections we have 
intentionally avoided imposing any direct connection between the events and the 
occurrences of alternatives from trial to trial. In many problems of practical 
interest, the events which alter the probabilities immediately follow occurrences 
of an alternative in a predetermined manner. Hence, in this section we will 
identify the events with the alternatives; the event probabilities are then equal 
to the values of p and so are no longer constant. Since we are considering only 
the case of two alternatives, we will have but two events. 

After n event occurrences, that is, after n trials, there will be 2” possible 
values of p. The mean of these is still defined by equation (5), but the mean on 
the next trial is 


(16) Vi ntt = Zz LDP.» Pi n(Qi phn) +a - Prin) Pr (Qe pr,n iP 
h==l 
Using the operations defined by equations (1) and the definitions of the raw 
moments, equations (5) and (11), we have after simplifications 
(17) V text = a2 + (4) — Q2 + a2) Vion + (a — a2)Veon. 
We observe at once that the mean on the (n + 1)st trial depends upon the second 


raw moment, V2,, , for the preceding trial. Analogous toe equation (13) we have 
for the recurrence formula for the higher moments 


k 


. k k—i k— i > ka r 
(18) n+ _ W (\a Qi — Ar ‘as) J i+1,n + ae as I air, 


i= 


It is straightforward to write the recursion relation for the moment generating 
function ¢,(@) of p, . The derivation is equivalent to that for V ..,,. We get 


6a4 A 


(19) Oni?) = e”* &,, (Bas) = o  »(Bcx;) _ zw taliliens)., 

ay ae 
where the primes on the ¢’s refer to derivatives with respect to 6. Thus far we 
have found this relation more tantalizing than useful. 

We see that the kth moment on the (n + 1)st trial depends on the (k + 1)st 
noment on the nth trial. This fact makes computations exceedingly difficult. 
As a result we have developed some approximations and bounds which are much 
easier to compute. 

The first approximation is an obvious one. We have called it the expected 
operator approximation. An expected operator, Q, is defined by 


(20) Y taeda = QV; a= Vi..QiV i. -+ (1 — VindQoVi nme 
Using the definitions of equations (1) we would obtain the approximation 
(21) Ving SD as + (ay — a2 + c2)Vi.n + (Q, — a2) Vion. 


If we compare this approximate result with equation (17) we see that V7, has 
replaced Vs, in the exact equation. This means that the expected operator 
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approximation behaves as if the variance of the distribution were zero. This 
behavior is clearly wrong since we know that the density is not concentrated at a 
point except possibly on the zero-th trial. However, as will be shown below, the 
expected operator will lead to a bound on the mean, Vi,,. 

The first set of bounds on the mean which we present are obtained from bounds 
on the second raw moment, V2,, . The lower bound on the second raw moment 
follows from the fact that the variance is never negative: 


, 72 2 72 
22 Von = Vin ton 2 Five 


The upper limit on V2,, is a bit more trouble to obtain. Consider first a distri- 
bution g(z) on the interval 0 S z S 1, having mean U,,, and second raw moment 
Us.n. We know at once that 


(23) Ven S Vis. 
We now transform this distribution to the interval uw. S 2 S uw, by letting 


(24) t= peo 


\< 


and find that 
(25) Von s (ua + M2) Vin — Mila. 


We may take 4; = A; and w2 = Az, provided that A, < A; , and obtain the desired 
upper bound on V2,,. These bounds on the second moment, inequalities (22 
and (25), may now be used in our recurrence relation (17) to obtain upper and 


lower bounds on the means. We shall carry this out only for the asymptotic 
distribution, for which we let Vin = Visi = Vi and Von = Vani = V2. (Harris 
has demonstrated [8] that the distribution of p, approaches a limiting distribu- 
tion independent of po as n — «© when 0 < Ay, Ax < landO <a, a: < 1.) If we 
then introduce the abbreviations 


(26) A = Gi — Ag t+ a — kK B=a,;— @, 
equation (17) can be written as: 


(27) AV, + BV. + a = 0. 


9 


We next insert the lower bound on V2 from (22) into equation (27) and obtain 
for B > 0, 


(28) BVi+AVit+a sO. 


2 
= 


When B < 0, the direction of the inequality in (28) is reversed. We denote the 
quadratic expression in (28) by g(V1) and note that g(0) = a2 which is positive. 
Further, from definitions (26) we see that q(1) = a: + a; — 1, which from re- 
strictions on the parameters discussed in Section 2 is a negative quantity. Thus, 
q(Vi) is positive at zero and negative at unity and so has but one root between 
zero and unity. If this root is called Y, we have V; 2 Y for B > Oand V, S Y 
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for B < 0. The bound Y is obtained from the expected operator approximation 
discussed above. 

The upper bound on V2, given by inequality (25), with uw: = Ay and yw. = Ae, 
may also be inserted in equation (27). For B > 0 we have 
(29) [A + (Ar + Ax)B]Vi 2 BAe — Ge. 


From the definitions (26) it may be shown that the coefficient of V; on the left 
side of (29) is always negative or zero and so we have 
By d2 = © 
‘=AF+AiF%)B 
When B < 0, the directions of the inequalities in (29) and (30) are reversed. 
An improved pair of bounds on the means V,,,, may be obtained from upper and 


lower bounds on the third raw moments V;,, . We consider again a distribution 
g(z) on the interval 0 S z S 1. Using the Schwarz inequality 


(31) (fae). (fe ae) = (f mae), 


and letting = zg(z) and & = 2’g(z), one can readily show that 
(32) Usn = Us2n/Uin 


where U;., is the kth raw moment of the distribution g(z) on the nth trial. 
After transforming this distribution to the interval \. S zx S \, by equation 
(24) we obtain 


(30) V 


(Vo, n—A2Viun)” 


(3: ee 
(33) Van 2 Ys ——¥ 


The upper bound on Varn may be found in the same manner. We let 


n = (1 — z)g (z) andé = - — 3) g (z) in the Schwarz inequality and then 
transform to the interval Ax S x S ), to obtain 


(Ven — di Vie)? 


(34 V3n SM V2. — ————— 
98) i Mm — Vis 

Inequalities (33) and (34) may now be used in the recurrence relations (17) 
and (18) to obtain bounds on V,,, . Equation (18) for k = 2 is 


(35) Vong = G2 + (2ava2 + at — a2)Vi.n 
+ (az Sa 2aj;a _ 2a202)V2 » a (aj = a3) Van ° 


The second moment V2, may be eliminated from this expression and equation 
(17). Further, equation (17) may be used to eliminate the second moment from 
inequalities (33) and (34). In this way one can write down a recurrence formula 
in the means alone for the two desired bounds. The reader will be spared the 
sight of the final result. 

Numerical computations of the two sets of bounds on Vj,, discussed above 
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PROBABILITY (p) 
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nee 
10 15 20 25 
TRIALS (Nn) 
Fig. 1 
Bounds on the probability distribution means, V;., , versus trials with the parameter values, 

a, = 0.3, a: = 0.6, az = 0.01, a2 = 0.9. Curve A is the expected operator bound which is in this 
case an upper bound. Curves B and C are the upper and lower bounds, respectively, obtained 
from maximizing and minimizing the third moment about the mean. Curve D is the lower bound 
obtained by maximizing the variance. The small circles represent the mean probabilities of 84 
Monte Carlo runs made with the above parameter values. 


TABLE I 
Bounds on the asymptotic mean, V1,,,, for seven numerical examples. The limits 
Ai and dz are the asymptotes obtained by applying one operator only. The bounds 
on V,,,, were obtained by maximizing and minimizing the second raw moment, 
Vo, and the third raw moment, V3; 


Parameter Values Limits Bounds on Vj, 


a, a2 1 2 2 From V2 From V5 


.300 
-300 


.500-.682  .655-.676 
.112—.668 | .418-.658 
.300 .013-.667  .093-.654 
.396 ; .394-.987  .967-.986 
.396 : 6 ‘ i : .184-.961 | .723-.878 
. 360 6 ‘ ‘ .09/—-.718 | .637—.657 
. 300 6 a : 0Q-—.667 0-. 656 


“JJ J J 


or co 


< 
© 


have been carried out for 25 trials with assumed values of the parameters. 
In Fig. 1 we show the results. Also in Table I we show the results of several 
such computations of bounds on the asymptotic mean. 
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In addition to the computations of bounds, we have used the Monte Carlo 
method of making approximate calculations. This involves using a random 
number table for making decisions about which operator, Q; or Q: to apply to the 
probability. The means of 84 such runs for 25 trials are also shown in Fig. 1. 


5. The estimation problem. Most mathematical models involve one or more 
unknown parameters; these parameters may be related to experimental variables 
but data from experiments must provide information about the values of the 
parameters. The model described in Section 4 has a total of five parameters when 
two alternatives and two events are considered: an initial probability, po , 


and the parameters a; , a, @, and a, contained in the definitions of the oper- 
ators Q, and Qe . When the model is applied to a particular experimental problem, 
one must estimate these five parameters from the data. In this and following 
sections we will discuss some of these estimation problems. We shall restrict our 
attention to two alternatives and two events, and to the case when the events are 
identified with the alternatives as in Section 4. 

There is a crucial question as to how many parameters the model can tolerate 

in the face of particular kinds of data. It appears to us that five parameters are 
too many for the kinds of data we have been studying. The obvious approach is to 
avoid using the model in full generality but to make special assumptions for 
specific applications, that is, to let some of the parameters be zero or unity or to 
set up relations between certain parameters from considerations such as sym- 
metry. In making such special assumptions in our applications to learning experi- 
ments, we have been guided by current psychological theories of learning such as 
reinforcement theory and association theory. However, we are interested in 
workable methods for estimating parameters in the general case. Our experience 
o date has led us ‘o believe that the estimation problem is a very untidy one if 
here are more than two parameters involved. Therefore, in the sections which 
tollow we will discuss only special cases where three of the five parameters are 
feliminated or are assumed to be known. 


6. Estimation procedures when the operators commute. The estimation 
problem is much simplified when the two operators, Q; and Q2, commute. From 
equation (1), it is easily shown that Q; and Q. commute if and only if 


(36) ay(l — ae) = axl — ay). 


This condition is fulfilled when one or both of the two operators is the identity 
operator, that is, if a, = O and a; = 1, or if ag = Oand a = 1, or both. Otherwise 
the operators commute only if the trapping theorem limits, \; and A» , defined by 
equation (4) are equal to one another. By setting 4; = a,/(1 — a;) = Ae = 
az/(1 — a.) = X, the two operators become 

Qip = ACL — a) + aap, 
(37) 


Qep = ACL — a@e2) + agp. 





568 ROBERT R. BUSH AND FREDERICK MOSTELLER 


The cases for which Q, or Q: is the identity operator, or both, may be considered 
as special cases of equations (37), since one may set a; = 1 or a. = 1 or both. 
Hence, the most general operations for which Q,Q. = Q.Q; are described by 
equations (37). 
We now make one further restriction and then develop a scheme for estimating 

the remaining parameters. We take \ = 1 to obtain 

Qip =l|]-—- ay + ap, 
(38) 

Qop =1l1—a+ a2p. 
We wish to estimate a; , a2, and po, the initial value of p, from actual data. 
Now on trial 7 there will have been some number k of previous occurrences of 
A, and so the probability q,. of an Az occurrence on trial n is 


(39) Qnk = 1 — QiQ? “po ° 


From equations (38) we may easily show that 


k 
do ’ 


k 
(40) Qn.k = aa3 
where q = 1 — py is the initial probability of an Az occurrence. In applications 


to learning data, two further sets of restrictions are of interest. First, if a, = 1, 
we have 


(41) Qn,k = ago ’ 


Second, if we consider only the data for which k = 0, we have 
(42) Qn = a2 qo. 

Both sets of restrictions yield equations of the form 

(43) qr = aq, 


and so we will be interested in the estimation problems which arise from equa- 
tions of this type. In general terms, q, is the probability of an A2 occurrence for 
the specified value of v. 

The data will be represented by random variables z,, which are considered 
to be binomial observations (0 or 1). The index 7 is used to indicate the 7th 
observation for the specified value of v. If A; occurs on the 7th observation for 
the given value of v, then z;, = 1, while if Az occurs then 2; = 0. The total 
number of binomial observations available in the data for a particular value 
of y may not be the same for all values of » and so we eénote that number by 
N,. We further define x, to be the number of times A; occurs during the N, 
observations, that is, 


Ny 


(44) rt, = >, Ziv. 


i=1 


The expected value of z,, is of course 1 — q,, that is, q, is the probability that 
Zi, = VY. 
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The discussion above might be compared with a special bioassay model in 
which doses are given at levels D) , Di , --- , D; (perhaps equally spaced in the 
logarithm), with a model for proportion living at dose D, to be qa’. We want 
to estimate g and a. In the bioassay case the number of animals at a dose is 
N,, usually fixed in advance unless some sequential approach is used. In the 
stochastic model presented here, the N, are random variables. 


6a. Maximum likelihood estimates of a and po. In this section we obtain some 
maximum likelihood estimates of the parameters a and p. We have not in- 
vestigated the general question of the efficiency of maximum likelihood pro- 
cedures when applied to stochastic processes. Though such investigations 
are beyond the scope of this paper, they clearly need to be made. We will proceed 
in the standard manner, setting aside these more general considerations. 

We wish to write down an expression for the likelihood of obtaining an ob- 
served set of data. First, the likelihood P, of obtaining x, occurrences of A, 
and N, — 2, occurrences of Az in a given order for a particular value of » is 


(45) P, = (1 — q)""(q)""”. 


The likelihood, P, of obtaining the entire set of data, for »y = 0,1, ---, 2, is then 
(46) P=][P,=]](1 -@)"@)*"™. 
Val) v= 


We insert in this expression the value of g, from equation (43) and take the 
logarithm to obtain 


(47) log P = >> {z, log (1 — a’go) + (N, — 2,) log (a’q)}. 

Vex() 
We wish to obtain the simultaneous maximum likelihood estimates of a and 
go and so we maximize log P with respect to those two parameters. Setting equal 
to zero the partial derivatives of log P with respect to a and q@ leads to the 
equations 


(48) > (N, — 2,) = 


v=() Vax() 1 — 


° Ava 


(49) > WN, — 2,) = > - nw Ze; 


A 


¥ 
vax() Vax) — a 40 


where & and gd are the maximum likelihood estimates of a and q , respectively. 
These equations, then, must be solved for & and g , but only numerical methods 
are available in general. However, in certain applications to learning data 
we have found convenient short cuts. We will discuss two of these. 

In some applications it is possible to choose 2 such that 2, for »y = 0, 1,---,2 
is some constant R independent of v. The factor z, may then be taken out of 
the sums on the right sides of equations (48) and (49), leaving the functions 


9 Awa 


(50) F(é, Go, Q) = > a qo 


Ay* > 
val) — a {fo 
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and 


Q Ava 


(51) G(a, go, 2) = D»—S* 


a) 1 —- a’ qo 





We have tabulated these functions for ranges of &, g , and 2, but these tables 
are too lengthy to be included here. The values of the functions F and @ are 
computed from the data with the formulas 


(52) F(é, &, ® = +> (N, — 2B), 
R Poxf) 


1 Q 


“> d »(N, — R), 


and then the tables are used to obtain the corresponding values of & and q . 

The second short cut we have found useful was developed for data in which 
go is known to be unity, or for which one is willing to assume g = 1. Equation 
(49) with @ = 1 may then be solved for @, that is, we have 


(53) G(a, Go, 2) 


(54) > o(N, - 2,) = > v = Rss 


a 
Val) Vaal) 1 = a’ 





The left side of this equation can be computed directly from the data but the 
right side depends on both the x, from the data and the estimate a. However, 


we have computed tables of the quantities va’/(1 — &’) versus v for fifty values 
of & = .50[.01)|.99, thereby greatly facilitating computations of the sum on the 


right side of equation (54) for a given set of 2, and a range of values of @. This 
procedure is especially workable when we have a good preliminary value of the 
correct & and hence know the appropriate range of values to use. For most data 
we have studied, & is near unity and so we can expand @’ in a power series about 
unity and retain only the linear term, namely, @” = 1 — v(1 — @). Using this 
approximation in equation (54) and simplifying yields the simple formula 


- a. v==i) 

(29.9) es 1 — “hea 
x wN, 
Vaal) 


which may be used to estimate a directly from the data without the use of 
tables, or at least to obtain a preliminary value of a. 

The asymptotic variances and covariances of the estimates & and go may be 
obtained by analogy with the procedure used in simpler problems. We illustrate 
only for the case when go is known and only @ is being estimated. We take the 
second partial derivative with respect to o of log P of equation (47) and obtain 


\ 


(56) a log P od - v fw, alae ‘ hs as 1) goa’ eo qoa’” \ 








> zt 
da* ve) QA” | (1 ~ a’qo)* 7 
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We need to take the expected value of this second derivative. From the definition 
of x, , equation (44), we see that E(x,) = (1 — q,)E(N,) = (1 — a’q)E(N,). 
Thus, from (56) we have 


=, Ee log =) _ va" “Ge E(N,). 
da” val) — a’do 

The asymptotic variance is the reciprocal of this quantity, but in order to 
compute it we need to evaluate the expected value of N, and this cannot be 
done until the problem is more completely specified. In the general formulation 
of the estimation problem which followed equation (43), we merely defined N, 
as the number of binomial observations available for a particular value of », 
but we left the distribution of NV, unspecified. However, for the case of one 
identity operator which led to equation (41), the index » corresponds to k, the 
number of previous occurrences of A; , and so N, — 1 is the number of A» oc- 
currences between the kth and (k + 1)st A; occurrences. Hence N, has a negative 
binomial distribution and expected value 


ae 


(58) E(N,) 


i ~& ihe 


where FR is the number of independent sequences in the data. When this expres- 
sion is used in equation (57) the asymptotic variance may be estimated. For 
the case of commuting onerators and data for which k = 0, equation (42) is 
appropriate and y corresponds to trial number n. If there are R independent 
sequences, then N, is the number of those sequences on trial n for which k = 0. 
It is readily shown that for this case 


(59) E(N,) = Raz? "q¢ . 


Again, the asymptotic variance may be estimated when this expression is used 
in equation (57). For some cases it may be difficult to evaluate E(N,). We 
presume that little violence will be done to the estimate of the variance by 
replacing £(N,) with the observed NV, , providing the V, are not too small. 


6b. The value of » when 4, first occurs. It is instructive to consider a quantity 
h, defined to be the value of » when alternative A, first occurs. The probability 
of an A; occurrence is | — qg, and q, is given by equation (43) 


(43) oe = aq . 
The density function for A is 


f(h) = qolaqo)(a?qo) (ago) «+ ~ (a qo) — aqu) 
(60) oe 
=a qo(l — aq), Az 1,2 


The latter form of writing equation (60) is also correct for h = 0. In words, f(h) 
is the probability that A; will first occur when v = h. This distribution might 
be regarded as a more complicated version of the negative binomial, the com- 
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plication being that the probabilities depend on a variable v as expressed by 
equation (43). The expected value of h is 


E(h) = . ie hfth) = - he? Pag = a qo) 
, 


h= ra) 
(61) 
~ 20 
h(h—1)/2_h A(h+1)/2 h+1 
=P hab? gh — Dd ha’ o”. 
il) h=—0 


This result may be simplified if we let y = h + 1 in the last sum, that is, 


it 2 ao 
‘ f h(h—1)/2_h y(y—1)/2_4 y(y—1)/2 4 
(62) E(h) = > ha’ qo — > ya" qo + > a qo - 
h=0 y=1 y=1 7 
The first two sums on the right side of equation (62) cancel and so we have 
«2 
8 ’ y(y-—1)/2_4 
(63) Eth) = 2» a” qo - 
= 


For known values of a and q the expected value, E(h), may be computed. If 
the maximum likelihood estimates @ and g are used in equation (63) for a 
and q, respectively, an estimate of E(h) is obtained. From certain kinds of 
data, a set of values of h will be observed and the mean of these sample values 
can be compared with the estimated E(h). 

The variance of h may be obtained in a similar way. The result is 


| 


(64) ao (h) = 2 7 yor G8 — E(h) — [E(h)]’. 
y=1 


By replacing a and q with their maximum likelihood estimates, o°(k) may 
be estimated from equation (64) and the result compared to the variance of 
an observed set of h’s. Conversely, a table of values of E(h) and o(h) for 
0<a< 1and0 < q S 1 may be constructed and a and q estimated from 
the observed set of values of h (method of moments). 

When gq = 1, equation (63) reduces to 


*s 


(65) E(h) = > ob”, 

y= 
This series has to do with theta functions. According to Whittaker and Watson 
[12] in their chapter on theta functions, this series was discussed by Jakob 
Bernoulli, Ars Conjectandi (1713), p. 55. Bromwich [1] lists in his table of con- 
tents ‘‘theta-series” (p. xii) and discusses our series and some related infinite 
products in some examples (pp. 101, 116, 117). 


6c. An unbiased estimate of a,. We next consider the problem of estimating 
a, of equation (40) when a2 and q@ are known. We will utilize only that portion 
of the data for which k = 1 and so we have probabilities 


7. n—1 
(66) Gn = %Q2 Qo, 
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and observations z;, for n = 1, 2, ---, andi = 1,2, ---, Nia. We let 


Nn, 1 
(67) Tn = >, Lin. 


i=] 


Thus, an unbiased estimate of g, is 1 — 2%n.1/Nna, and for each value of n we 
obtain an unbiased estimate a,, of a : 


(68) 5 
Aan = 


We next wish to combine these estimates by taking a weighted mean, & , 
over those values of n for which N,,; is not zero, that is, we let 


> Wadin 
 ———? 


We choose the weight W,, to be inversely proportional to the variance of the 


(69) 


estimate &,,. (If the estimates a,, are independent, this procedure minimizes 
the variance of & .) The unbiased estimate, 1 — 2,,;/N,,, of q¢, has variance 


n—l f n—1 
aa. gol(l — aya2 qo) 


Naa Nz 1 


i 
(70) et =~ 


It follows then from equation (68) that the variance of ,, is 


(71) ie, = Si - ss 2 
‘ \ ”% eS ee ae 
N n,1a3 "Qo , 


and so the weight W,, may be taken to be 


n—l 


N n,1Q@2 


(72) WwW, = ——_.. 
1 — ajyas qo 

The only difficulty in computing the weights from this result is that the expres- 
sion contains a, , the parameter being estimated. However, an iteration scheme 
may be used to compute & ; we begin with an unweighted mean of the a,,, and 
compute the weights, use these to obtain & from equation (69), use this value 
of & to recompute the weights, ete. However, we may replace a; by & in equa- 
tion (72) and substitute the expression for the W,, into equation (69) and ob- 
tain, after simplifications, 


(73) z. —— 21 = 2. Na... 
n n 


1 — Qao gq 


The right side of this equation is obtained at once from the data; the sum on 
the left may be computed as a function of & , for the given values of z,1, a 
and gq, and the correct value of & obtained by successive approximations. 
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It will be noted that the sum is a monotonic increasing function of & for 


O54 51.65 ws S$ 1.0 5 @ & 1. 


The variance of the estimate & is obtained immediately from equation (66) 


and the relation 
(74) 


The result is 


’ 1 Qo q Cy 


1 — ‘end q qd a WwW, 


This result may be used to estimate o(&) by replacing a; on the right side by 


(75 





its estimate & . 

The precedure just described for obtaining an unbiased estimate of a; from 
the subset of data for which k = 1 may be generalized to obtaining unbiased 
estimates of a; for any value of k. Analogous to equation (73) we obtain the 
relation 


See = Nn (k L, 2, **" 
2 Qo n 

where N,,, is the number of observations for the specified values of n and 4, 
and wx, is the number of those observations which vield alternative 4,. This 
equation may be solved numerically for af and by taking the kth root of this 
estimate of a; we obtain an estimate of a; ; for k > 1 these estimates are biased, 
of course, but they may be useful in obtaining an improved estimate of a; from 
certain kinds of data. 


6d. Monte Carlo checks on the estimates. As a check on the estimation 
procedures just described, 30 Monte Carlo runs of 22 trials each were made as 
described in Section 4. The operators of equations (38) were used with a; = 0.70, 
a = 0.95, and p, = 0. The parameter a2 was estimated by obtaining a numerical 
solution to equation (49) with g = 1; » was taken to be trial number n and 
only that portion of the data on each run up through the first A; occurrence 
was used. The result obtained was d@& = 0.9509 and the estimate of the standard 
deviation of d@ obtained from equation (58) was 0.008. The approximate value 
obtained from equation (55) was 0.956. Next, the procedure described in Section 
Ge to estimate a for the subset of data for which / = 1 gave & = 0.758 and 
o(&) = 0.08. The estimates obtained compare favorably with the true values, 
a, = 0.70 and ay = 0.95, used in making the Monte Carlo computations. 


6e. A related problem of estimating a binomial parameter. A problem related 
to the estimation problem discussed in Section 6a, but not a part of the genera! 
stochastic model, may be of some interest. The problem arises when we have a 
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choice about the kind of information we can receive from binomial sampling. 


In a single binomial trial, event A or event A occurs. The probability of event 
A on a single trial is a and we wish to estimate a on the basis of information 
received. Suppose we have a choice: we can know the outcomes of N’ single 
binomial trials, or else we can know the outcome for each of NV blocks of v trials 
in the forms ‘‘all trials in the block were A’s,” or “not all trials in the block were 
A’s.”’ The probability that all trials in the block are A’s is g, = a’. The question 
for the statistician is what value of v will give an optimum estimate of a. Dorfman 
mentions a similar problem concerning Wassermann tests [4]. Blood samples 
from several people could be pooled and the Wassermann test on the pool would 
be positive or negative. A negative report on the pool implies a negative report 
on all blood samples in the pool, while a positive report merely implies that 
one or more blood samples in the pool are positive. If we are to make a fixed 
number (.V) of Wassermann tests, and if costs of increasing the size v of the pool 
are very little, what value of » should be chosen to get the best estimate of a? 
Dorfman was interested in identifying the positive individuals; we are interested 
in the estimation problem. The problem is not necessarily restricted to integral 
values of v. For example, let a be the probability that a unit surface area of an 
industrial material such as sheet metal has no defects. We plan to inspect a 
sample area from each of V sheets of material, but we have a choice about the 
size of the area to be inspected. The report of the inspection is either that no 
defect was found or that some defect was found. We may be quite wise to use 
such an inspection procedure because actual counts of number of defects in an 
area can be quite untrustworthy while the reports ‘‘no defects” or ‘‘some de- 
fects” are comparatively reliable. Here the.question is what size area should be 
inspected to give the best estimate of a, a measure of the quality of the product. 

It turns out that R. A. Fisher [7] has already solved this question. He calls 
it the dilution problem. In our notation the maxiinum likelihood estimate 


of ais 

(77) & = (y/N)'", 

where 1 is the number of blocks which have all A’s (e.g., negative blood tests 
or no detects). We see from equation (57) with E(N,) = N that the asymptoti 
variance is 


¥ 


l—a 


Nar?” 


(78) a (a) = 
We wish to choose vy such that o(a@) has the smallest possible value for given 
a and N. It turns out that the minimizing value of v is 


1.594 


(79) yo - 
— log, a 


which agrees, of course, with Fisher’s result. This means that if we have any 
good preliminary notions of the value of a we can improve the method of estima- 
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tion of a, compared to the ordinary binomial method, by choosing a value of 
v based on the preliminary value of a. If @ is near unity, the use of blocks of 
size v yields an effective binomial sample of size v.V. 


7. Applications to learning data. We have found the estimation procedures 
described above to be useful in analyzing various kinds of learning data. We 
present two examples. 

The first illustration we will describe only briefly. It is closely related to the 
work of Miller and McGill [9]. The experiments analyzed by those authors 
were of the following sort. A person was presented with a series of R monosyllabic 
words and was then instructed to repeat all he could recall. This procedure was 
repeated for many trials, the order of the words being randomized on each trial. 
For lists of words not too long, Miller and McGill postulated that recall of a 
word increased the probability of recall on the next trial according to an operator 
defined by the first of equations (38); it was further postulated that nonrecall 
of a word left the recall probability unchanged, that is, ag = 1 in equations 
(38). The probabilities of nonrecall, g, , after » previous recalls, are then given 
by equation (43). The maximum likelihood procedures given in Section 6a are 
thus appropriate for estimating a = a and q 

The second example, to be described in more detail, is data obtained by 
Solomon and Wynne [10] from experiments on the avoidance training of dogs. 
A dog is placed in a jumping stand and may jump over a barrier to avoid an 
intense electric shock. The shock is turned on 10 seconds after a signal which 
defines the start of a trial, and so on each trial the dog either avoids shock or 
escapes shock. We identify avoidance with alternative A; and nonavoidance 
(escape) with alternative A2. The record of 30 dogs for 20 trials each is given 
in Table II; avoidance is denoted by a “1” and nonavoidance by a “0.’’ From 
these raw data we obtained the numbers N,,, and z,,. where n refers to trial 
number (n = 0, 1, ---), and k& is the number of previous avoidances ( = 0, 
1, --- ,n). Thus, N,, is the number of dogs on trial n which avoided precisely 
k times previous to trial n, and x,,, is the number of those dogs which avoid on 
trial n. In Table III we give these quantities, derived from the raw data of 
Table II, for k = 0, 1, 2, 3. 


We assume that the operators of equations (38) are appropriate for this 
experiment, that is, that both avoidance (A,) and nonavoidance (42) increase 
the probability p of avoidance and tend to make it unity. The data strongly 
suggest that pp , the initial probability of avoidance, is very near zero and so we 
assume po = O(go = 1). Thus, from equation (40), the probabilities q,, of non- 
avoidance on trial n after / previous avoidances are 


k n-k 
(SO) Qakt = ajay 


We wish to estimate the parameters a; and a from the data. 
First we consider the data up to and including the first avoidance of each 
dog (k = 0) and apply the maximum likelihood procedures given in Section 6a. 
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The index v becomes trial number n, x, becomes 2,5, and N, becomes Nz . 
Equation (55) gives a2 = 0.93, and a numerical solution of equation (54) gives 
& = 0.923. With the aid of equations (57) and (59) we estimate the standard 
deviation of & to be o(a) = 0.014. From the analysis given in Section 6b we 
next compute the mean and variance of h which is here interpreted to be the 


TABLE II 


Data on 30 dogs obtained by Solomon and Wynne [10]. The entry ‘‘1’’ indicates a dog avoided 
and the entry ‘‘0’’ indicates it did not avoid 
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trial on which the first avoidance occurs. Using a. = 0.92 and g = 1 in equa- 
tions (63) and (64) gives E(h) = 4.39 and o(h) = 2.28. From the raw data of 
Table II one gets a mean value of h = 4.50 and a standard deviation of 2.25. 
The close agreement between the computed expectation and the observed 


mean of h must be mainly accidental for the standard deviation of the estimate 
his o(h) = 23/0/30 = 0.42. 
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Next we consider that subset of the data for which the number of previous 
avoidances is precisely one (k = 1). The estimation procedures described in 
Section 6c are directly applicable. A numerical solution of equation (73) yields 
& = 0.732 and from equation (75) we obtain ¢(@) = 0.095. Another estimate 
of a; was obtained from the data for k = 2 and equation (76); the result was 
& = 0.801. Still another estimate, from the data for k = 3, was & = 0.705. 
An unweighted mean of the three estimates is about 0.75. The three estimates 


TABLE III 


Values of N,., the number of dogs on trial n with precisely k previous avoidances, 
and xx, the number of those dogs which avoid on trial n, taken from the data 
given in Table II 
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of a; are well within one sigma of the mean 0.75, and so this may be taken as 
some small evidence of the appropriateness of the model. 

An inference which may be made from the above analysis of the Solomon- 
Wynne data is the following. A trial on which nonavoidance occurs reduces the 
probability of nonavoidance by a factor 0.92, while a trial on which avoidance 
occurs reduces the probability of nonavoidance by a factor 0.75. Thus an avoid- 
ance is worth about 3.5 nonavoidances in teaching the dog to avoid. Such a 
conclusion may be of theoretical interest to psychologists. 


8. The equal a case. In this section we will consider another special case, one 
in which the mathematical analysis of the model is especially simple. We let 
a, = ao = a. The two operators become 
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(ip = a, + ap Ai(1 — a) + ap 
(op = a, + ap ho(l — a) + ap. 


(81) 


In equation (17) the term in the second moment drops out and we have left a 
simple linear difference equation in the means alone. The solution is 


(82) 2. = fo _ (V1... aaa Vio)(ay — Ge a a)”, 


where 


‘ 1 — (a; — & + a) 1—A: + rye 
where A; and A, ave the limits defined by equation (4). The expected operator 
Q, discussed in Section 4, now gives the correct means as may be seen by com- 
paring equations (17) and (21). Since the means from trial to trial are obtained 
by repeated application of this single operator, the changes in the mean are 
described by a simple two-state Markov chain with constant transition proba- 
bilities, a2 and (1 — a — a). 

In equation (18) for the higher raw moments, the terms in V,.;,, also vanish, 
and so the higher moments may be computed readily from that formula. The 
equation for the second raw moment becomes 


(S4) Venti = ay + (2aa. + ay — a2)Vin a (a + 2aja — 2aca)Vo,. 


After equation (82) is inserted, a difference equation in the second raw moments 
is obtained. The solution turns out to be 


as ; aB 
o (1 —_ yn ii E - 
7 Cc") 4 rea 


where 
B = 2aa, + aj — as = (1 — a) [(Ai — A2)(1 — @) + add], 
C = a + 2aqa — 2a; ala + 2(d1 — A2)(1 — a), 
g a4 —a+a=1-— (1—a)(l — \y + dQ). 


As unsightly as this solution may appear, it is an exact expression for V»2,, 
as a function of n and the parameters. Such an exact closed expression is not at 
present available for the general two-operator model except when a; = a = a. 

Four parameters remain to be estimated: pp , a; , a2, and a. It may be noted 
from equations (82) and (83) that at most three of these could be estimated 
from the means; these means depend only upon Vio = py , a2 and (a; — a2 + a). 
One might expect, however, that the variance of the data from trial to trial 
along with equation (85) could be used to estimate the fourth parameter. We 
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shall indicate how this procedure might be feasible by making further restric- 
tions. 


8a. Equal a case, upper limit unity, lower limit zero. We now require that 
a, = O and a, = 1 — ain addition to requiring that a; = a, = a. These further 
restrictions imply that the limits defined by equation (4) are \; = 1 and A: = 0. 
The two operators become 
Q\p l—a-+ap 


Qop = ap. 


(87) 


We have found this case useful for analyzing T-maze data on rats with identical 
and equally frequent rewards on the two sides of the maze (Stanley, [11}). 
Equation (82) then becomes 


(88) Vin = Po; n=0,1,2,---. 
The recurrence relation (84) for the second raw moment becomes 

(89) Vong = (1 — @)*pp + a(2 — a)Vo2,. 

The solution of this linear difference equation is 

(90) Ven = Po — Poll — po)8" 

where 

(91) 8 = a(2 — a) = 1 — (1 — a)’. 


This result may be obtained from equation (85) with a. = 0 and a; = 1 — a. 
The variance of the probability distribution on trial n is 

(92) on = Van — po = Po(l — po)(1 — 8"). 

From this result we see that the variance is zero for n = O and approaches the 
binomial variance, po(1 — po), as n gets large providing |8| < 1. It can be 
shown that the distribution approaches a distribution with density po at unity 
and density (1 — po) at zero. 

From equation (88) we see that the observed means from trial to trial may be 
used to estimate py , but that the means provide no information about the param- 
eter a. Equation (92), on the other hand, shows that the variances, o,, of the 
distributions of probabilities depend upon 8 and hence upon a. As a result, one 
might expect to obtain an estimate of a from these variances. Such a procedure 
would lead to a simple double estimation problem as will now be indicated. 

On trial n, a distribution of probabilities p,;, exists. If one has data on A 
subjects, these data correspond to a sample of A probabilities p;, from the 
population of all possible values of p;, on trial n. If one knew the values of these 
K probabilities, then one could readily estimate the population mean and 
variance from the sample mean and variance. But the A probabilities, p;, , are 
not known, of course. Each p;, becomes the mean of a binomial distribution of 
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random variables «x,, . If the 7th subject chooses A, on trial 2, then z;, = 1, 
while if he chooses A, on trial n, then z,, = 0. The set of 2;,’s provides the only 
information we have about the A probabilities p,,.The problem is clear— 
we must use the z,,, to estimate the p,, and then use these estimates to estimate 
the properties of the distribution of all possible p,,’s on the nth trial. 


8b. Estimation of po. From equation (88) we see that the mean of the dis- 
tribution of all possible probabilities p,, is po on every trial for the case being 
considered, namely, a; = a, A, = 1, and 4» = 0. For a sample of K subjects 
we have a sample of A probabilities p;, , having a mean j, . This sample mean, 
Pn, provides an estimate of V;,, = po. The sample mean, j, , is estimated in 
turn by the proportion #, of subjects choosing A, on trial n: 


| K 
(93) i. == Zz Pins 
Km 


Thus an estimate of po is 
(94) (jon = En. 


Such an estimate of p) is obtained from the data on each trial and so one can 
combine these estimates to obtain an improved estimate of po. We have not 
worked out an optimum way of combining the trial estimates, but one estimate 
of po is obtained from a simple average of the individual trial estimates: 


l N 1 N K 
(95) po = > Ze ——— Zz. yi Sia. 


= N +1 n==() . K(N + 1) n=) im] 


In other words, we estimate p) by the proportion of choices of A; in the entire 
set of data. 


8c. Estimation of a. Although we do not have an entirely satisfactory method 
of estimating a, we provide one method and invite (as in all these problems) 
suggestions for improving the estimation process. 

We will break the data up into two subsets S; and S2 such that all sequences 
in S; begin with an occurrence of A, and all those in S, begin with an occurrence 
of As. Thus, zr» = 1 fori ¢ S; and xz» = Ofori e S.. On trial n = 1, sequences 
in S, will have probability Qi. = 1 — a@ + apo and sequences in S2 will have 
probability Qep) = op). Therefore we may consider all sequences in S, to have 
an initial probability @Q:p) and all those in S, to have an initial probability 
(op) . According to equation (88). then, the means V;,,, will equal these initial 
probabilities on all future trials. Hence, we can estimate Q:po by the proportion 
P, of A; occurrences in S, for trials n = 1, 2, --- , and similarly can estimate 
Q2p) by the proportion P, of A; occurrences in S: for trials n = 1, 2, --- . We 
then observe that 


2 This subsection was revised and considerably simplified in accordance with a suggestion 
by a referee. 
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(96) (ip = Qeopo = ] ~~ &, 
Accordingly we can estimate a by 
(97) a = 1 — (P, — P»). 


This estimate is very easy to obtain from a set of data, but it clearly does not 
utilize all the information in the data. More efficient procedures are undoubtedly 
available. 

8d. Estimation of a when limits are not zero and unity. Finally we propose 
one procedure for estimating a@ from data when the limits are 0 < A; < 1 and 
0 < dX» < 1. The means, V;,, , from trial to trial are given by equation (82), 
which we write in the form 
(YS) Vin = Vig — Vin — Vial 


where V,,,, and g are defined by equations (83) and (86), namely 


(99) 


(100) gui ~~ 6) =& +d). 


Now the mean V,,, on trial m may be estimated by the proportion P, of A, 
occurrences on trial 7, that is 


K 
(101) ru = z > Lin 
K = 


where A is the number of available sequences in the data. We then may sum 
these proportions P,, over all trials n = 0,1, --- , V — 1, to obtain 


N—1 N—1 K 
l 
(102) P=), P, ==), 2 fin 
n=() kK n==0 i=l 
This quantity P is simply the total number of A; occurrences in the data divided 
by A, the number of sequences. Since P,, estimates V;,,, we must sum Vj,,, of 
equation (98) over trials. We call this sum Sy : 


1 


N—1 N 
Sy => > Vin = Zz. Loa pa (Vi... = Vio g'} 


n= () n=() 


(103) 
aa 7 l1—gq 
= NV, — (Vi... — Vio) —— .- 
1—g 
The quantity P of equation (102) estimates Sy , and so we thus can solve for g 
in terms of V;,., Vio, and N. When we know in advance the values of Ai, As, 
and V;., we may estimate a. In particular when q <1 we have as the estimate 
ot Qa, 
Ving — Vio 
(104) — seo =e Le paiiaetiinel 
(fi —- hk, + RD. - PO 
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We have found this estimation scheme useful for analyzing data on human 
subjects in a two-choice situation. 


8e. Applications to learning experiments. In this section we will apply the 
model of Section 8 and the estimation procedure given in Section 8d to three 
sets of data on behavior in a choice situation. The first set of data was obtained 
by Stanley from seven rats in a T-maze experiment [11]. On each trial the 
rat could turn either left or right in the maze, and for the portion of Stanley’s 
data being considered here, the rat always found food on one side (alternative 
A;) and never found food on the other side (alternative A»). The second set of 
data was obtained by the authors with the assistance of Miss J. M. Jarrett from 
five Harvard undergraduates operating a machine called the ‘“‘two-armed 
bandit” (work unpublished). On each trial the subject pushed one of two buttons; 
one choice was always followed by a penny reward and the other side never 
led to reward. The third set of data was obtained by R. R. Bush, R. L. Davis, 
and G. L. Thompson on six high school students in Santa Monica, California 
(work unpublished). In this experiment, the subjects were presented with two 
ordinary playing cards, face down, on each trial, and they were told to turn 
over one of the two cards; if the card turned over was a heart or diamond they 
received a reward of a nickel. All cards in one position were reward cards, and 
all cards in the other position were nonreward cards. 

In all three experiments we identify the choice which leads to reward with 
alternative A, and the other choice with alternative A... We assume that the 
operators of equation (S81) are appropriate, and we take A; = A. = 1, that is, 
we assume that either choice tends to make the probability p of A; equal to 
unity. From equation (83) we see that Vi, = 1 and we assume that Vi» = 0.5. 
Thus equation (104) becomes 


0.5 
N-P 


(105) 


where P is defined by equation (102) and is the mean number of choices A; up 
through trial V. Thus, V — P is the mean number of errors (choices of A») made 
by the K subjects in each experiment. The results of the three experiments are 
summarized in Table IV. 


9. Discussion. The stochastic processes described in this paper are closely 


related to Markov chains [6]. In fact, the process we defined can be considered 
to be a Markov chain if correctly viewed. A Markov chain is characterized by 
the property of “path independence.” The system can exist in a number of states 
and if it is in the 7th state, the transition probabilities to all other states are 
independent of how the system arrived in the 7th state. Now if we identify 
the alternatives in our model with the states of the system, the process is clearly 
non-Markovian; the transition probabilities change as required by the operators 
of equations (1). The process defined by our model is a Markov chain, however, 
if we identify the states of the system with che values of p. Of course, an infinite 
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number of states are then possible. If the system is in state p then the transition 
probabilities to ¢ other states are given by equations (1) and the probabilities 
of transition to all other states are zero. In spite of this observation, we have 
made little use of the theory of Markov chains with an infinite number of states. 

A special case which we have not handled satisfactorily is that for which 
de = O and A; = 1, that is, for which ag = O and a; = 1 — qa. In this case the 
bounds on the means described in Section 4 are of little use since they demand 
only that the asymptotic mean lie between zero and unity. It is easily shown 
that all the asymptotic raw moments are equal, provided that a stable asymptotic 
distribution exists at all. This would mean that the density tends to be con- 
centrated at zero and unity. The proportion which would he concentrated at 
unity is V;,,.. We have shown that V,,,, would depend upon py as well as upon 
a, and a. , but we have not obtained a closed expression for V;,,, as a function 


TABLE IV 
Data and computations for three experiments on two-choice situations. The rat 
data are from Stanley’s T'-maze experiment [11]. The two groups of Harvard students 
were studied by Bush, Mosteller, and Jarrett using the ‘‘Two-armed bandit’; 
the group marked ‘‘pay’’ could either lose or break even on each trial, while the 
group marked ‘‘free”’ could either break even or win. The data on high school students 
were obtained by Bush, Davis, and Thompson in Santa Monica, California. 


, S . 
a Harvard Students High School 


Students 


Pay Free 


Number subjects cme 

Number trials 2: 

Mean errors............ ; * 4, 3.5 
Estimate & .955 , . 96: 0.857 


of these quantities. Likewise, we have not satisfactorily handled the estimation 
problem for this case even though this case seems to be of practical interest in 
some learning problems. 

More generally, the outstanding problems seem to be the need for better 
expressions for the moments, or at least improved bounds for these moments, 
and more efficient estimation procedures in the cases we have discussed, and 
estimation procedures for the less special cases we ave not discussed. ‘The 
estimation procedures would no doubt depend on the particular type of data 
available; the values of some parameters may be known from symmetry con- 
siderations or from other experiments. Furthermore, some kinds of data provide 
more than one binorhial observation per subject per trial, while others provide 
only one such observation. These considerations complicate the issues, so that 
a model whose parameters cannot be easily estimated for one type of experi- 
ment may be satisfactory in another type. 
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LOCALLY OPTIMAL DESIGNS FOR ESTIMATING PARAMETERS 
By HermMan CHERNOFF 


Stanford University 


1. Summary. It is desired to estimate s parameters 6;, 62,..., 0,. There is 
available a set of experiments which may be performed. The probability dis- 
tribution of the data obtained from any of these experiments may depend on 
6, O,---, &,k = s. One is permitted to select a design consisting of n of 
these experiments to be performed independently. The repetition of experiments 
is permitted in the design. We shall show that, under mild conditions, locally 
optimal designs for large n may be approximated by selecting a certain set of 
rsk+(k-—1)+.--- + (k — s + 1) of the experiments available and by 
repeating each of these r experiments in certain specified proportions. Examples 
are given illustrating how this result simplifies considerably the problem of 
obtaining optimal designs. The criterion of optimality that is employed is one 
that involves the use of Fisher’s information matrix. For the case where it is 
desired to estimate one of the k parameters, this criterion corresponds to mini- 
mizing the variance of the asymptotic distribution of the maximum likelihood 
estimate of that parameter. 

The result of this paper constitutes a generalization of a result of Elfving 
[1]. As in Elfving’s paper, the results extend to the case where the cost depends 
on the experiment and the amount of money to be allocated on experimentation 
is determined instead of the sample size. 


2. Introduction. Before formulating the problem precisely we shall consider 
a simple special example which will illustrate many of the points involved. 
Consider the regression problem 


(1) y=y+tbxrt+u —] 


where wu is an unobserved disturbance which is normally distributed with mean 
0 and variance 1. The disturbances of successive observations are distributed 
independently of each other. Suppose that we are permitted to select a set of 
n values of x between —1 and +1 and to observe the corresponding values of 
y. If our objective were to estimate 6, it is well known that the best procedure 
consists of using x = +1 for half of the observations and x = —1 for the other 
half. 

In this problem we may regard the observation of a y corresponding to a 
given value of 2 as an experiment /,. The class of available experiments is the 
set |E,: —1 S$ x S 1}. The parameter in which we are interested is 6, but the 
distribution of the data depends on y also. In this case y is a nuisance parameter. 
The optimal design consists of using each of the two experiments EF; and E_, 
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half the time (if m is even). It should be noted that if the set of experiments 
available were decreased so that £, is available only for —1 < x < 1, no optimal 
design could be found. This is essentially due to the fact that given any design, 
a better one can be obtained by spreading out the values of x« even more (i.e., 
by taking values of x closer to the end points —1 and +1). 

A peculiarity of this particular problem is that no matter how many times a 
particular experiment E, is repeated, no reasonable estimate of 6 can be deter- 
mined. At least two distinct experiments are required. Another peculiarity of 
this problem is that the variance of 6, the maximum likelihood estimate of 6 
does not depend on the value of y and 6. In general, this latter property will 
not hold and we shall be restricted to obtaining locally optimal designs, that is, 
designs which are optimal if the parameters are known to be close to certain 
specified values. 

We may consider a variation of the above problem. Suppose that it is desired 
to estimate y and that 6 is the nuisance parameter. Then it is well known that 
an optimal design consists in repeating the experiment Ep, n times. An equally 
optimal design may also be obtained by using any set of x’s so that 7 = 0. 


3. Information matrices and mixed experiments. The formulation of our 
problem will involve the concepts of information matrices [2] and of randomized 
or mixed experiments. For the sake of notational convenience and in order to 
clear up some technicalities that arise, we shall discuss these concepts before 
proceeding to the formulation. 

R. A. Fisher defined the information matrix X(@) for an experiment involving 
the parameter 6 = (6, 62, °+- , 0%) by 
(2) xX(6) = - EV oot = ||2,) || ij= te om 
where L is the logarithm of the likelihood function. It should be noted that X(6) 
ordinarily depends on @. It is easily seen and well known that 


| 2 {aL abt | 
| * (a0; 86;) |! 


(3) X(@) = 


and hence that X(@) is a nonnegative definite symmetric matrix. 

Another well known property of information matrices is that of additivity. 
That is, if £,, E.,---, E, are experiments yielding information matrices 
Xi(6), X2(0),--- , X,(@), the combined experiment or design which consists 
in carrying out each of these experiments independently yields the information 
matrix X,(0) + X.(#) + --- + 4X, (0) 

The experiment which consists in carrying out one of the available experi- 
ments, this one to be determined by a random device, is called a randomized 
or mixed experiment. Hence if pi, p2,°** , Pn are positive numbers adding up 
to one, the experiment which consists in carrying out FE; with probability p, is 
mixed. It is easily seen that this experiment has information matrix p.X1(6) + 


poX2(6) + --- + prX,(9). 
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Let an experiment FE with positive definite information matrix X\@) be carried 
out m times and let 6 = (6,, 62, --- , 6) be the resulting maximum likelihood 
estimate of 6 = (0,, 02,---, &). Under mild conditions [3], the covariance 
matrix of the asymptotic (as m > «) distribution of ~m(6 — 8) is given by 


(4) X60) = |j x(@) | J 2. -++ k 


at all points of continuity of X(@). This property suggests the usefulness of 
information matrices in comparing designs. 

Unfortunately, it is possible for an information matrix to be singular and 
henc? to fail to have an inverse. To allow for this situation, we extend the notion 
of inverse to the class of nonnegative definite symmetric matrices. Let X be 
nonnegative definite symmetric and let Y be any other symmetric matrix so that 
X + XY is positive definite for positive small enough. Then, let 


(5) X7 = |J cl] = lim (X +aY)7U 
A+ 


In Appendix A it will be shown that this new definition is consistent with the 
usual definition and is statistically meaningful. Also, if x‘ and x” are finite, 
then zx’? is finite and x", x” and x’? are independent of the particular Y selected 
It should be noted that X~* is a continuous function of X on the set of positive 
definite symmetric matrices but that elements of X~* may fail to be continuous 
for X singular. 


4. Formulation. In this section we shall formulate our problem and then in- 
dicate the reasons behind this formulation. Using the special example previ- 
ously mentioned, we shall examine conditions which we shall impose to obtain 
the desired results. 

There is a set {E} of experiments available. The distribution of the data 
from one of these experiments depends on 0 = (6, , 62, --- , &). The information 
matrix X(@) may be characterized by the elements on and above the main 
diagonal. These elements arranged in some order may be considered as com- 
ponents of a vector in k(k + 1)/2 dimensional space. This vector may be identi- 
fied with the matrix. Since we sre interested in locally optimal designs, that is, 
designs that are optimal when @ is known to be close to some given vali.e, say 
6 = (of, as”, --- , of), we confine our attention to X(@). 

Let R, be the set of vectors corresponding to the X(6") for the experiments of 
{E}. Let R be the convex hull of R, , that is, a typical element of R is the convex 
linear combination pX, + poXe + .-+: + praX, where Xi, X2,-+--, Xn are 
elements of R, and p, , po, +++ , Pn are positive numbers adding up to 1. 

From the previous section, it follows that R represents the set of information 
matrices of the class of mixed experiments. 


1 The author is indebted to Max A. Woodbury and the referee who independently pointed 
out a close relationship existing between this definition of inverse and the concept of the 
pseudo inverse of a matrix. 
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r . . . . *,° c 
We shall be interested in showing that under certain conditions, an element X 
of R which minimizes 


, 11 22 
(6) .(X)=xe +r +: +2" 
can be represented as a convex linear combination of 


(7) rak+C@- 1) 4+-: + E—2+)) 
elements X,, X:,°-- , Xr of Ry. 

It is evident that X corresponds to a mixed experiment which is “optimal” 
in the sense that if 6 were based on n repetitions of this experiment, the sum 
of the variances in the asymptotic, (as n — ~), distribution of ~Wn(6, — 4), 
V/n(b2 — 6), ---, Vn(6, — 8.), would be a minimum. 

Certain questions naturally arise concerning the usefulness of this criterion. 
First, it may be asked whether this criterion is relevant if one desires to confine 
oneself to pure experiments, that is, elements of {EZ}. Here we note that as n 
— «, X may be approximated by (mX1 + noXo + --- + n,X,)/n where mn, 
No,*** ,, are positive integers adding up to n. The latter expression represents 
1/n times the information matrix corresponding to the design where E£; is carried 
out n; times. The answer to the last question would be yes if it were shown that 
v.(X) is continuous at X = X on the convex set generated by X,, X2,---, 
Re. 

One may also ask why our criterion should involve information matrices. 
Such a criterion has a certain aesthetic appeal. Furthermore, we shall discuss 
in Appendix B how the main result yields a justification of this criterion. 

Finally, one may seriously inquire whether a “good” design must minimize 
the sum of the asymptotic variances. In fact, we shall see in Appendix C that 
very often when one is interested in s parameters, a sound criterion for a “good” 
design involves minimizing tr(AV) where A is a nonnegative definite symmetric 
matrix of rank less than or equal to s and V is the covariance matrix of the 
asymptotic distribution of /n(@ — @). By a linear transformation of @ this 
criterion may be transformed to that of minimizing the sum of no more than 
$s asymptotic variances. 

Since certain conditions must be imposed to obtain our desired result, we 
shall explain these conditions by referring to the example considered in Section 
2. In that example the experiment Z, yields a likelihood function with logarithm 
given by 

L = —} log 2m — 3(y — y — 62)’. 
Let 6; = 6 and 6 = y. The corresponding information matrix is given by 
x 
ae - 
eS. 4 


For this example, R; is the set of all points in three dimensional space whose 


coordinates are (x, x, 1), —1 S x < 1. This set represents a segment of a para- 
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bola lying in a plane of three dimensional space. The convex set R generated 
by R, is the set bounded by R, and the line segment connecting the end points 
of R,. The optimal design consisting in using X, and X_,, each half the time, 
corresponds to the mid-point of the above-mentioned line segment, that is, 
the point (1, 0, 1). 

We mentioned previously that if x is restricted to —1 <x < 1, no optimal 
nth order design exists. Note that in this case R has been changed by deleting 
the boundary line segment on which the optimizing point (1, 0, 1) lies. Although 
we can get arbitrarily close to this point when —1 < x < 1, we cannot reach 
it. In general, to prevent this minor difficulty we shall impose the condition 
that PR be closed. Then R will contain all of its boundary points. 

A second condition that we shall impose is that R be bounded. That is, no 
element of Y, can be made arbitrarily large by selecting EL, properly. This 
condition is satisfied in our example, for there no element can exceed 1 in ab- 
solute value. If, however, the example were modified to permit all real values 
of x, the element of the first row and first column of XY, would be unbounded. 
Note in this modified example, that if the parameter y were known, 6 could be 
estimated with arbitrarily small variance from one experiment by taking zx 
large enough. This interpretation of the effect of unbounded FR applies to the 
general case, too. If some element of X is unbounded, the fact that X is non- 
negative definite implies that some element of the main diagonal of X is un- 
bounded. If the 7th element of the main diagonal of X is unbounded, 6; can be 
estimated with arbitrarily small asymptotic variance if all the other parameters 
are known. 


5. Main results. In this section ve state our main results. The proofs will 
first be given for s = 1 and then extended to s > 1. 

Turorem 1. Jf R is closed and bounded there is an element X of R which mini- 
mizes v,(X) = a + a? +--+ + 2 and which is a convex linear combination 


of rs k+ (k — 1) 4+--- + (k — 8 + 1) elements X,, X2,---, X, of Ri. 
Furthermore X,, X2, --- , X, may be chosen so that v,(X) is a continuous function 
at X = X with respect to the topology of the convex set generated by X,, Xo, --- , 
} 

We treat the case s = 1 where we let 
(8) 2(X) = 9X) = 2"” 
In outline, our proof for s = 1 consists in obtaining an expression for 

6(X, A) = 2(X + A) — 2(X) 


which will be used to show the existence of an X“” ¢ R which minimizes 2(Y) 
and such that X"” lies on a supporting hyperplane of R. It will also be evident 
that z(X) is constant on a sub-hyperplane. The dimension of this sub-hyper- 
plane leads to the existence of XY with the desired properties. The complexity 
of the details of the proof arise mainly from difficulties in the case that X“° is 
singular since z2(Y) is not continuous at singular LY. 
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Lemma 1. Jf X and X + A are nonnegative definite symmetric matrices and 
2(X) # =x, then 


(9) 6(X, A) = 2(X + A) — 2(X) = —e(X, A) + n(X, A) 
where 


(10) e(X, A) = lim @(X, A), 
h—0+ 


(11) e(X, A) = ((X + ADI *A(X + AD In, 
(12) n(X, A) = lim m(X, A), 
A—04 


(13) m(X, A) = [((X + AD A(X + AT + AD ACX 4D a 


and «(X, A) is a linear function in A and n(X, A) 2 0. 
Proor. Since the matrices (X + AZ) and (X + A + AJ) are positive definite 
for A > 0 


(14) 6(X, A) = lim [2e(X + 4+ AJ) — 2(X + Ad), 
A—-0+ 


(415) (X¥ +a2 +4) —(X+a2 = —(X¥ +A A(X + al” 
+ (X + al “Ax + a + a “Ale + an. 


Since 2(X) # © it follows (see Appendix A, property 1) that limo (X 
+ J)" exists and is finite for each i. Let us denote this limit by X" = X”. 
Hence 


k 
(16) e(X, A) = lim @(X,4) = > Xa, X". 
\—-0+ 


i,j=1 


It follows that as 4 — o+, m(X, A) converges (possibly to +). Since the 
matrix, of which m(X, A) is the element of the first row and first column, is 
nonnegative definite it follows that n(X, A) 2 0. 

Lemma 2. Jf X and X + A are nonnegative definite symmetric matrices and 
z(X) = &, then limg.oz(X + A) = «. (We write A— 0 if each element of A 
approaches zero. Note that A converges to zero subject to the condition that 
X + Ais nonnegative definite and symmetric.) 

Proor. If A and B are symmetric matrices we use the notation A < B or 
B > A if p'Ap S p’Bp for every vector p. If A and B are positive definite and 
A < B, it is easily seen that B™' < A™' by diagonalizing B and A. Also, 

2(X + A) = Jim ((X +44 A/) "Ju. 


A—0+ 
Let d be the largest characteristic value of A. Then 
N+A+A)<(¥ +QA4@D, (YX +A4AD77 >(¥ +0401)" 
2(X + A) = (X +d)". 
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As A— 0, d > 0. Furthermore 2(X) = ©. Hence limgo (X + d/)" = ~, and 
our lemma follows. 

Lemna 3. If R is closed and bounded, 2(X) attains its minimum on R. 

Proor. Let w = infxer z(X). Because R is bounded, w > 0. The case w = ~ 
is trivial and hence we assume 0 < w < . Since R is closed and bounded there 
is a sequence {X} such that X“ ¢ R, 2(X) > w and {X‘"} has a limit point 
X” ¢€R. Let AP = X® — X. It suffices to show that z(X) < w. By Lemma 
2,2(X) # o. Hence 


(Xx + a) — of X®) = —e(X™, a) + of X™, a”). 
Since ¢ is linear in A’”, 


lim (XA) = 0. 
iz 
But n(X’, A“) = 0. Letting i— «, we obtain w — 2(X) = 0. 

Hereafter we shall assume that R is closed and bounded. Then let p be the lowest 
rank associated with those elements of R which minimize z(X). Now we assume 
that X° minimizes 2(X) on R, 2(X) # @ and X™ has rank p. We shall now 
reduce the set under consideration from R to R n H,; where H, is a hyperplane 
containing X° and H, has dimension p(p + 1)/2. In the event that X is 
nonsingular, no reduction from R has been effected. We shall not consider the 
trivial case X° = 0 for then w = &. 

We construct H; as follows. Corresponding to X“’, there is an orthogonal 
matrix P = | p,;;\ such that 


(17) X® = P’EP 


where 


5 E; 0 
(18) E= 
0 O 


and £; is a diagonal p X p matrix where all the elements on the main diagonal 
are positive. We define H, as the set of X for which 


D, 0 
(19) P(X — Xp =("" 
0 O 


where D; is a symmetric p X p matrix. It is evident that H, is a p(p + 1)/2 dimen- 
sional hyperplane containing X. We note that the nonnull set R n Hy, is the 
convex hull of R; n H, and is also closed and bounded. 

Lemna 4. [f X° ¢ Ra H,, X™ has rank p, 2(X“’) ¥ ~,andX” + AcH,, 
then n(X“’, vA) approaches zero at least quadratically as v — 0 and 2(X) is con- 
tinuous at X = X” (in the topology of H,). 

Proor. Since X° ¢ Rn H, and X™ has rank p, 


, F, 0 
PX” P’ = 


0 O/ 
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2 e ee . ° ° y(1) 
where F; is a positive definite symmetric matrix. If X° + Ae H,, 


PAP’ = * ) 
0 0 


where D; is symmetric. Let p, be the vector consisting of the first p elements 
of the first column of P. If F; and F,; + vD; are nonnegative definite we have, 


(20) m(X, vA) = pF; + AJ) '(vD,)(Fr + MI + vD;)'(vD,)(F, + A) ‘pr. 


But F; is positive definite and for » small enough F; + vD, is also positive 
definite. Therefore, 


(21) n(X, vd) = piF7'(vD))(F; + vD))'(vD)F7' pr, 
and 


lim n(X”, vd)/v = p: Fy DF; D,; Fy pr < @, 


y—0 
Similarly one may obtain 


(22) e(X, vA) = p;F7'(vD))F7'pr. 


The continuity of z(X) at X = X“ follows immediately from equations (21) 
and (22). 

Lemma 5. There is a sub-hyperplane Hz of H, which is a supporting hyperplane 
of Ra H, at X. Hz has dimension 3(p(p + 1)) — 1. 

Proor. Suppose XY = X“’ + AceRn H,. By convexity 


X° + vAcRnH, forO <<» <1. 


If «X, A) > 0, it follows from Lemma 4 and the linearity of ¢ that 2(X’ 
+ vA) — 2(X) < 0 for small enough positive v. This contradicts the fact 
that X“”’ minimizes 2(X) on R. Hence 


di”. & =~ 3°46 fo Xe Rn. 
The sub-hyperplane H, of H, defined by the restriction 
23) (X°, X — X) = p)E7'D,E7'p,; = 0 


is a supporting hyperplane of Rn H, at X“’. The fact that equation (23) actually 
constitutes a restriction on X depends on the fact that p; # 0, and this in turn 
is easily established from z(X) # 2%, which implies that the last k — p elements 
of the first column of P are all zero. 

Lemma 6. There is a sub-hyperplane H; of Hz so that 2(X) = 2(X“’) for Xe Ra 
H; . The dimension of Hz, minus that of Hz is no more than p — 1. 

Proor. For X ¢ Hz, (X°,X — X) = 0. From equation (21) it follows that 
if E,; + D, is nonsingular, the restriction 


(24) prE;'D; = 0 
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implies n(X", A) = 0 and hence z(X"’ + A) = 2(X“”’). This implication holds 
even if £, + D, is singular and nonnegative definite. For then we may apply 
equation (20) with F; replaced by E; and v = 1. We note that subject to re- 
striction (24) pik; +1) 'D,; = O(A). Furthermore (E; + AJ + D,)" < (1/A)I 
whence m(X‘’, A) = O(A) and 2(X® + A) = 2(X"”). 

Equation (24) constitutes a set of at most p linearly independent restrictions 
on X = X” + A. However, since the restriction «(X“’, A) = 0 may be written 
pik;'D,E;'p; = (0 it follows that on H2, the restriction (24) constitutes a set 
of at most p — 1 linearly independent restrictions. Let H; be the sub-hyperplane 
of Hz on which p,E;'D,; = 0. Lemma 6 follows. 

Lemma 7. There is an element X of R which minimizes 2(X) and which is a con- 
vex combination of a set of r S p elements of Rin He. 

Proor. The set Rn H; is closed, convex and bounded. There exists at least 
one element X of Rn Hy which is not a convex combination of any two distinct 
elements of Rn H;. By Lemma 6, z(X) = 2(X’), that is, X minimizes z(X) 
on R. The matrix X is an element of H2 which supports Rn H,. Hence X isa 
convex combination of elements of R, n Hz . Let r be the least number of elements 
of Rn He which are required to yield X as a convex combination. Then X is 
an interior point of Rm H, where H, is an r— 1 dimensional sub-hyperplane of 
H, . Since X was selected so that X is not an interior point of any line segment 
of Ra H;, Hyn H; must have dimension 0 and hence r — 1 S p — 1. 

Lemma 8. Theorem | is valid for s = 1. 

Proor. Lemma shows the existence of ¥ and the continuity property is given 
by Lemma 4. 

Now that Theorem | has been established for s = 1, we shall extend the proof 
‘for s > 1. In that case we change our notation slightly. We let 


2(X) = »(X) =a +2" 4+---+ 2" 
6(X, A) = 2(X + A) — 2(X) 
e(X, A) = &'(X, A) + &(X, A) +--+ + a (X, A) 
e(X, A) = lim «(X, A) 


A—0 * 
(29) ‘ (X, A) + af (X, A) + --- + oy” (X, A) 


(30) n(X, A) = lim m(X, A) 
A—0F 


where e (XY, A) and »,'’(X, A) are obtained from the 7th diagonal terms of the 


matrices appearing in equations (11) and (13), respectively. Then Lemmas 1, 


2,3, and 4 may be established as in the case for s + 1. Equations (20), (21), and 


(22) are slightly modified. To illustrate, equation (20) becomes 


(a1) ™(X ) vA) = D> pf? (F, + 1) oD) (F; + MT + wD) 
‘ i=l 


-(vD;) (Fr + AD pf” 
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where p; is the vector whose components are the first p elements of the ith 
column of P. It will be uscful to note later that the condition z2(X") # * implies 
that the last k — p elements of the first s columns of P are all zeros. In fact, 


1 (s ° 
Pi, Pr,***, Pr are then unit orthogonal vectors. 2 
Lemma 5 follows as before with the restriction defining H, replaced by 


(32) (XX, X — X) = > pi” E;'D, E;' pS” = 0. 
=} 


In Lemma 6, the wording must be modified so that the dimension of H» minus 
that of H; is no more than p + (p — 1) + --- + (p — s +1) — 1. The change 
is due to the fact that restriction (24) is now replaced by 
(33) P}E7'D,; = 0 


where P, is the (p X s) matrix of rank s consisting of the first p rows and s columns 
of P. It is possible to rearrange the rows and columns of D; (maintaining sym- 
metry) so that equation (33) may be expressed by 

Dy Dy 
(34) (Qu Qy) = ( 

Do Dov 
where Qi, is nonsingular, Qu, Qe, Du, De = De and Dy are of orders X s, 
s X (p — s), 8 KX 8,8 X (p — s) and (p — s) X (p — 8), respectively, and 


Dy Dy 
Dx Dee 
is the result of rearranging the rows and columns of D,;. But then 
Dy = — Qi QD» 
Du = —Qn' QuDv = —QiQuDxn . 


Hence, after the restriction (33), D is determined by Dx and has only (p — s) 
(9 — s+ 1)/2 linearly independent elements. Hence, equation (33) imposes 
elo +1) (p—s\(p—s+1) _ 8229p —s +1) 


9 9 9 


=p+(p— 1) 4+---+ &@— 2+ 1) 


independent linear restrictions on the symmetric matrix D. But as before one 
of these restrictions is lost on H., for (33) implies «(X"”, 4) = 0. Lemma 7, 
with p replaced by p + (p — 1) + --- + (p — s + 1) follows as before. Theorem 
| is once more an immediate consequence of Lemmas 4 and 7. 


6. Remarks. In many cases, the cost of experimentation depends on the 
experiment. Then the usual design problem is to maximize information, given 
a certain amount of money to spend on experimentation. Our results of Section 
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5 are easily seen to apply in this case, too. Here we identify with each experi- 
ment a matrix 


(35) Y(0) = X(6)/c 


where c is the cost of performing the experiment. The matrix Y(@) represents in- 
formation per unit cost. The matrix which we associate with the mixed experi- 
ment when £; is carried out with probability p;,7 = 1,2, ---,n, is 
‘ a 
D> pi Xe) Do eipi Yi) 
(36) Yo) = 4 —___ = + ____.. 
Do ici Leip 
i=1 i=1 
It is evident that a reasonable criterion for a good mixed experiment is that 
v.{¥(@)] be minimized. 
In [1], Elfving obtained our result (Theorem 1) for s = 1 and s = k in the 
case of linear regression. Elfving also indicated an elegant geometrical method 
of obtaining the optimal design. The methods used by Elfving depend only on 


the assumption that for any nonrandomized experiment, X(@) may be repre- 
sented in the form 


(37) X(6) = |} 25;(0) || = || x;(6)x;(0) 


Hence, these methods may be applied in many examples which are not regres- 
sion problems. 


7. Examples. In this section we shall discuss some examples in order to show 
how the results of Section 5 may be used to reduce considerably the amount of 
work required to obtain optimal designs of experiments. The results of Elfving 
[1] make it unnecessary for us to consider the important and interesting examples 
from linear regression theory. 

EXAMPLE 1. Suppose that 


(38) Pa=e" @>0,d20, 


represents the probability that an insect will survive a dose of d units of a certain 
insecticide. It is desired to select n dose levels to try on n insects to estimate @ 
in an optimal fashion. 

Here the information matrix corresponding to a particular d is given by 
‘ , 2 6d / 6d 
(39) Xaz=de /(1l— ) d>0O 


and 


(40) Aa = 0 d= 0. 


The conditions of Theorem 1 are satisfied and hence it follows that a locally 
optimal design consists of repeating one dose level n times. Maximizing X, we find 
that this dose level satisfies 
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(40) d= . 
6 

For this locally optimal dose level, the probability of survival pz is very close to 
..2. An interesting by-product is that for the general design the maximum likeli- 
hood estimates are not too simple to obtain or study. For the optimal design 
the estimation problem is that of estimating the probability associated with a 
binomial distribution.’ 

EXAMPLE 2. Let A and B represent two characteristics that members of a 
population may or may not have, for example, the habit of smoking and heart dis- 
ease. Let A and B represent the complementary characteristics. It is desired to 
estimate the degree of dependence of the two characteristics A and B. Five experi- 
ments may be performed. These correspond to examining individuals either: 
(i) at random; (ii) with characteristic A; (iii) with characteristic A; (iv) with 
characteristic B; or (v) with characteristic B. The parameters involved are 
Pa, pa = 1 — pa, Pe, ps = 1 — ps,and 0 = pag — Paps Where p with a sub- 
script indicates the proportion of the population with the characteristics of the 
subscript. There are three independent parameters. 

In the case where p, and pg are known, it has been shown by Blackwell [4] 
that to test for independence an optimal design involves repeating one experi- 
ment n times. This experiment is the one which corresponds to the smallest of 
the four probabilities p., pa, ps, and ps. Here, Theorem 1 may be applied 
to yield the same result if it is desired to estimate 6 when @ is assumed to be close 
to zero. 

Suppose, now, that our problem is modified so that pz is only approximately 
known. Here, Theorem 1 applies with k = 2, s = 1, and tells us that we should 
use at most two of the experiments. Furthermore, since selecting an individual 
at random is equivalent to a mixture of two of the other experiments we may 
confine our attention only to pairs of the other four experiments. Let us now 
evaluate the information matrix XY, corresponding to examining an individual 
with characteristic A, this information matrix to be evaluated at 6 = 0. In this 
experiment the probability of observing a smoker is pg + @/px . If the individual 
observed has characteristic B 


6 
L = log (ps + =), 
P 


A 
OL 
0 Ps 


2 The author wishes to express his thanks to Fred Andrews for his assistance on this 
example. 





598 HERMAN CHERNOFF 


If the individual observed has characteristic B, 


L = log(1 ~ ms - *), ee ees 
Pa 


” l a z, ) 
( Ps Da Pa 


Hence 
(41) 


Similarly 


(42) ee “— 
Pa PA Pa pa 
=—Be Papi 


Pe 
1 -—— 0 
(43) X ee 
Da PA Pp Ps 
0 O 


and 


| 
|| De 


1 nd 
“ Xs = Dopipe ph 

APA PB PB 

0 O0| 

From the remarks of the previous section, it follows that Elfving’s results [1] 
may be applied. The geomeirical figure that is developed shows immediately 
that the optimal design consists of using either B, or B or A and A each half the 
time, according as to which of the numbers 


= ‘Pa a e 

Ps’ V _" 2V ps pi 

is greatest. This last result can also be obtained directly without computational 
difficulty. 


8. Appendices. 
Appenprx A. Extension of the inverse to nonnegative definite symmetric 
matrices. 
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Here we extend the notion of the inverse of a matrix to nonnegative definite 
symmetric matrices and show how this extension has statistical significance. 
Suppose that X is nonnegative definite and symmetric. Let Y be a symmetric 
matrix so that X + AY is positive definite for positive \ sufficiently small. Then 
we define the inverse of X relative to Y by 
(45) Xy' = lim (X + XY)". 

A—0F 

The usefulness of this definition arises mainly from the following property. 

Property 1. The diagonal elements of Xy' are independent of Y. Furthermore, 
if the ith and jth diagonal elements of X¥' are finite, the (i, 7) element of Xy' is finite 
and independent of Y. If the ith diagonal element of Xy' is finite, all the elements of 
the ith row of Xj are finite. 

Proor. Corresponding to X there is an orthogonal matrix P such that 


_ Ey 0 
(46) P'XP=E = 
0 O 


, 


where Fy, is a diagonal p X p matrix whose diagonal elements ¢, ¢2, -*- 
are positive. We define F by 


- i Fy Fy. 
(47) PYP=fFf = ‘ 
Fy Fe 


Then F» is positive definite and 


(X +\Y)" = P(E + F)'P’ 


“11 + Old) —(En + Or) Fis Fe 


=P at 


— Fx Fa {Ent + O(d)] ~ (Fs — AFn [Eun + AF 1] 


Let pi and py» represent the first p and the remaining k — p elements of the 7th 
column of P. Then 
, r\ ij ty] t yl 1 , w-lp 7! 
(X + AY)” = pa Ey pa — Pio Foe Fy Ey, pji — paky F 2 F22 pj2 
(48) 1 — siecle l ‘ ; 
+ X pi2F x. Piz + Pi2F 22 Py Ey Py2 Fo Pj + O(A). 
Suppose that limy.o.(Y -+ AY)" is finite. Then py = 0 and 
= . cae i L. yi 
(49) lim (X + AY)" = pakiipa = dD = 
h—20-4 h=1 €p 


which is independent of Y. Also 


(50) lim (X + AY)" = paEy pi _ paky Fy Fx p 9 


A—(0+ 
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which is finite (though it depends on Y). Suppose that (XY + AY)" and (X + AY)” 
are finite. Then pix = 0, pjz = 0, and 


p 

(51) lim (X + AY)" = paFiton = >> PPM 
A—+0+ h=1 = Gp, 

which is independent of Y. 

Let us now assume that the probability distribution of the data of an experi- 
ment depends on ¢ = (¢1, ¢2, ++: , ¢a) and that the information matrix with 
respect to ¢ is positive definite. Let us assume that the above distribution is 
independent of Y = (fi, Ye, ---, ws). Suppose now, that the parameters in 
which we are interested are 6 = (0;, 02, --- , 9.) and» = (m,2,°** , m2) Where 
a +b=c-+d and there is 2 one to one relationship between (¢, y) and (8, 7). 
In fact, suppose that 


(53) 6 = gly) 

(54) n= gly, v) 

where the Jacobian of the transformation is not zero and where for each com- 
ponent of » the partial derivative with respect to some component of y does not 


vanish. We also suppose that the likelihood may be expressed in terms of (6, 7), 
that is, 


(55) L = u(y) = w(8, n). 
We are interested in the following information matrices: 


(56) U = E{u, Uy} 


a . (/wewe wew,\| Wee We, 
(57) W=E \ , : = 
\\W_, We W,W,/ } We Wu 


where u, is a row vector whose ith component is du/d¢; . We shall also use the 
notation ¢» to denote an a X c matrix whose (7, j) ele. ent is dg;/060;. We as- 
sume that U is positive definite and U~' represents a covariance matrix > gy . 
For our extension of the notion of the inverse of a matrix to be suitable, it should 
yield for us the following property. 

Property 2. The matrix W~ may be decomposed as follows: 


( —- w* w" 
38) a = 
wY ww" 


r68 . . . . 
where W™ is uniquely defined and is given by 


(59) w” = A, 7 i = >= 


and where the diagonal elements of W™ are infinite. 


PROOF: 


U 0 
A’ ( ) A 
0 O 
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where 
go Yn 


Yo Yn 


A= 


is nonsingular. Furthermore 
0 O 
0 I 


io wa 4 


‘ 
, , > Ny Ny e 
Ne Rae 66 Ne a ne + lim 


A—07 BN 


W + dA’ 


War(p {) 4= 


Property 1, together with the fact that not all components of ny vanish, yields 
our desired results. 


AppenpbIx B. Justification for the use of information matrices. We sketch here a 
brief justification for the use of information matrices in our formulation. This 
justification presupposes that we are interested in the variances of the asymptotic 
distribution of the estimate based on our design. Rubin has shown [5] that under 
mild conditions these variances are greater than or equal to the diagonal ele- 
ments of the inverse of the information matrix. On the other hand, if the design 
involves repeating a fixed number of these experiments in certain proportions, 
one (again under mild conditions) obtains equality. Since the “optimal” design 
using the information criterion involves repeating a fixed number of experi- 
ments in certain proportions, the sum of the variances of the asymptotic dis- 
tributions of the estimates with this “optimal” design is actually equal to the 
minimum v, which is a lower bound for the sum of the variances of the asymptotic 
distributions of the estimates for all designs. 

APPENDIX C. The relevance of sums of variances. If one is interested in the pa- 
rameters 6; , 62, --- , 6, itmay be assumed that fora givenestimate t; , 2, --- , ts 
there is a loss represented by a function 
(60) g(t, 0) = gly, ta, +++ te, 1, Oo, ++ , O) 


-?) 


which as a function of the ¢; is a minimum at ¢; = 6;. If we assume that g is 
sufficiently well-behaved and that the sample is large enough so that the ¢; are 
close to 6; with large probability 


g(t, 8) = g(,0) + > 06.) t; — 6)(t; — 6) + Olt — 0) 
1,j7=1 


The “value” of our statistic is measured by how small FE {g(t, 6)} is. For large 
samples (size n) we have, under mild conditions, 


Ej git, 6)} = (8, 6) ae > » aijoij +0 (+) 


i,j=l 
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where | ¢,; is the covariance matrix of the asymptotic distribution of ¢ and 
_ 0 9(6, 6) 

At; at, 

minimizes 


(61) > Qij ij. 


t,j=1 


a; . A reasonable criterion of a good statistic ¢ should then be that it 


We now note that since g is minimized at ¢ = 6, the matrix A = | a,; , should 
be nonnegative definite. If A has rank p, it is possible to reduce the above ex- 
pression to >-%-,0;, by a linear transformation on 6. Ordinarily one would ex- 
pect p = s if one ts interested in s parameters. 
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THE DISTRIBUTION OF QUASI-RANGES IN SAMPLES 
FROM A NORMAL POPULATION 


By J. H. CADWELL 
Ordnance Board, Great Britain’ 


1. Summary. A method is developed for the evaluation of the probability 
density function of the statistic: 


Wr = Tae — Iran 


where 2; , 2, -+* , 2, are ordered values in a sample of n from a normal popula- 
tion. 

It is shown that, up to n = 17, wy is the most efficient statistic of this type 
for the estimation of population standard deviation. Beyond this point w, is 
optimum up to n = 31, where w2 becomes better. Tables of moment constants 
and percentage points are given for w; over the range 10 to 30. 

Similar methods are used to determine the efficiencies of two estimates of the 
form w, + Aw, . 

The approximation used is compared with three other published approxima- 
tions in the case of range (r = 0). 

Godwin [5] and Nair [11] have discussed problems of this kind for sample 
sizes up to 10, using exact values of the first two moments. Karl Pearson [12], 
Mosteller [10] and Jones [9] have considered the large sample case. The methods 


of the present paper go some way towards filling the gap between these ap- 
proaches. Moreover, they are not restricted to consideration of mean and vari- 
ance only. 


2. Introduction. The use of range as a rapid means of estimating population 
standard deviation is usually restricted to sample sizes below 20. There are 
several reasons for this restriction. Beyond this point the efficiency of such an 
estimate, when compared with one based on sample standard deviation, falls 
off rapidly. For larger samples the ratio of mean sample range to population 
standard deviation depends rather critically on the form of the tails of the parent 
distribution. Thus estimates based upon a normal model may be misleading. 
Finally the probability of the presence of a ‘‘rogue”’ observation will increase 
with the sample size. Such a freak observation is likely to lead to an unusually 
large value of range. 

One method of overcoming these drawbacks consists in splitting the sample 
into smaller groups and finding the average range of these groups. Such a process 
is not unique and this may be a drawback in certain circumstances. In addition, 
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unless the order in which values are recorded is known to be independent of these 
values, a randomising process is necessary. This will rob the method of speed, 
one of the main assets of range as an estimator of standard deviation. 

In a recent paper Mosteller [10] drew attention to the possible use of w, , 
calling such statistics quasi-ranges. He mentioned an earlier result of K. Pear- 
son’s [12] concerning the use of inter-quantile distances as estimators in large 
samples from a normal population. In this case an optimum efficiency of 65 
per cent is attained when points at a proportion 0.07 of the sample size from 
either end are used. For small samples, range is the most efficient estimator of 
this kind. Evidently, as n increases, the optimum value of r will cease to be zero 
at some point. Since w; does not depend on the values of the extreme observa- 
tions, it is likely to be less affected by departures from normality or by the possi- 
ble presence of an occasional “‘rogue’”’ observation. Thus it should be preferable 
to range beyond a certain sample size. 

Godwin [4], [5], in discussing a more general type of estimator, found the 
first two moments of w, for values of n up to 10. His method depends on a series 
of double quadratures and becomes very laborious as n increases. Below we find 
an asymptotic series for the p.d.f. of w, . As sometimes happens (e.g. with Stir- 
ling’s series) results of high accuracy are obtained for small as well as for large n. 


3. Derivation of the asymptotic series. It is shown in [1] that a series expansion 
of the appropriate integrand leads to close approximations to moments of quan- 
tiles in the normal case. Thus, for odd n, the median of a set of n values has a 
variance close to: 


ty Mea ae 0), 
(w + 2n — 2) | (x + 2n — 2)? 


For n = 3 this is in error by 0.001, and as n increases the error rapidly sinks to 
zero. 
A similar method can be applied to the p.d.f. of w, when the parent distribu- 


tion is symmetrical. We denote the p.d.f. of the parent by ¢(x), and need the 
functions: 


(x) 


(d\' 
A (7) = ( 4 e(x) / #12), 
OU 


The p.d.f. of w, will be given by the integral: 
os &(x)] [3 _ P(r aa w,)|" 


-[&(2 + w,) — B(x))" ol(aie(ax + w,) dx. 
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Since ¢(x) is symmetrical, this integrand will have its maximum value at 
—}w,, and will fall rapidly to zero on either side of this point. This suggests 
expanding the integrand * terms of: t = x + 4u,. 

Except where otherwise specified, wherever a Greek function letter appears 
the argument will be 3w, . Consequently this argument is omitted in the interests 
of simplicity. 


After taking logarithms of the appropriate series, and again expanding, we 
find: 


(2) (x + w,) — &(x) = 2b exp {300 + ---} 


(3) [3 + &(x))[3 — (2 + w,)] = G — 0)’ exp (-+ ye + 
(4) elaele + w) = exp -|(£) -“]e +.) 
\ L\e ¢ 


Thus, apart from a constant factor, the integrand can be written in the form: 


9 


) 


(1+ At + BE +--+} exp —<(£) —*F 4ry>+y) — in — 2r — 2068. 
\¢ ¢ 


Using the form of Watson’s lemma given by Jeffreys [8], we see that term- 
by-term integration of this series yields an asymptotic series for the p.d.f. The 
first term of this series is: 


1 2r 9 n—2r-—2 
flw,) ~ a — &) 8) 


2(€) - Geeagcn cea 


The constant C is chosen to make the area under the approximate p.d.f. equal 
to unity. 


If we now consider the case where ¢(x) is the normal density, (4) becomes: 
g(zae(atwu)y=¢ exp — t. 
With this modification the ee series becomes: 
(5) f(wr) ~ Ce*(28)"-" "(3 —"&) "4 i - e=7 8 [3(9’)? — o”"]K* +--+ 5 


1 ; ” , 
where == 2+ 2ry + y) — (n — 2r — 2)6. 


The second term of the series in brackets is of order 1/n. The next term, of 
ae 
order 1/n’, is: 


7 BW)* — v" — aw” — 6 + "A 


(n — 2r — 2) 


+ a8 (ov — 150’ 6’ + 3010’) JK +. ° (n — 2r — 2)’ (3(6’)? — o”’"}? K*. 
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While these expressions are rather involved, n and r enter into them quite 
simply. Thus, once the basic combinations of the @, ¢ and y functions have been 
evaluated, the time taken to evaluate f(w,) is very much less than would be 
needed if the exact expression (1) were used. 


4. Accuracy of the dominant term. A convenient measure of accuracy is pro- 
vided by comparing the mean value of w, , found from the approximate p.d_f., 
with the exact value: 


a 

. ’ , 4 n qr , n—r-l \ 

(6) E(w,) = 2(r + 1) | a(t — &(x)]" [2 + O(2)] o(x) dx. 
a et 

This integral is easily evaluated by quadrature; for r = 0 values are given to 5 

decimal places by K. Pearson [13]. The accuracy of the higher moments, and 


the use of further terms of the expansion, are considered in subsequent para- 


graphs. 
For fixed n, k decreases as r increases, provided w, is greater than 1.8. For 
values of w, below 1.8, the quantity: 


(7) sey — oe” 


is very small. Thus it is evident from (5), that an inerease of r should lead to 
greater accuracy in the dominant term. We find that, when n = 30, values are 
as follows. 


Exact value Error of dominant term 


E(w») $.0855 +0.0095 
E(w,) 3.2312 +().0019 
E(w») 2.7296 +0.0006 


For fixed r, the effect of the second term will depend on the position of the mode 
of the p.d.f. The expression (7) rises from zero at w, = 0 to a maximum value 
near w, = 4.2, and then falls to zero again as w, increases. 

For small , all but the right tail of the distribution will lie in a region where 
(7) is small. For large n, the mode of the distribution will be well beyond the 
region where (7) has any appreciable effect. 

Thus, as n increases, the effect of the error in the dominant term will first 
increase and later fall to zero. Since the mean value of w, increases very slowly 
with n, the position of greatest error will occur for a sample size of the order of 
a hundred. 

For E(w) we tind the results: 


90 30 60 100 


Error 0 0.0058 0.0086 0.0095 0.0107 0.0114 


“% error 0 0.202 0.229 0.232 0.232 6.227 
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Thus, while the error in the mean is still increasing slowly beyond n = 60, the 
maximum percentage error occurs near 50. 

For w; the mode will be smaller than for wy , for the same sample size, and 
consequently the increasing error will be longer maintained. Errors in the mean 
value are given below. 


10 20) 30 40) 60 


Error ... 0.00019 0.00083 0.00153 0.00185 0.00205 0.00227 
% error ..| 0.0188 0.0414 0.0543 0.0573 0.0584 0.0586 


It seems likely that the maximum percentage error is little in excess of its value 
at n = 60. 


5. Application to range. We first compare the accuracy of the dominant term 
with that of three other asymptotic expressions. 

Gumbel [6] has found the asymptotic distribution of range in the general syin- 
metric population. His result is: 


(8) f(R) = 2e-"Ky(2e™*) 
where 
R = ne(u){fwe — 2u}, @(u) = 1 — I/n. 
Elfving [3] derives an asymptotic expression for the normal case, it is: 
(9) f(é) = EKo(8), 


where 


:= mii —o/(! 
\ 


In formulae (8) and (9) Ky represents a modified Bessel function of the second 
kind. Using the method of steepest descent, Cox [2] derives the result: 


Ww w ts 
=) | a0 (5) 


f(wo) ~ i 


In the case of range, the dominant term of the series considered here is: 


n(n — IV ar¢ [26]" 


‘ - ; 
2 — (n — 2)¢e 


(11) . 
| 
This is asymptotically identical with (10). However, as ¢’ is small for quite 
moderate values of wo , » has to be very large for good agreement between (10) 
and (11). 
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Cox examines the errors of (8), (9) and (10) when n = 20. His results are 
compared with those obtained from (11) in the following table. 


Mean St. Devn. Bi Bo 


Exact value....... eas.g sie 3.7350 0.7287 0.1627 3.259 
Error of (8)..... ae csane} “a2 +0.15 +0.49 +0.94 
Error of (9) ; ... +0.03 +0.04 —0.09 —0.18 
Error of (10)........ oe... +0.10 +0.05 +0.05 +0.16 
Error of (11) tee eeeessseee-) $0.0086 +0.0025  —0.0043 —0.019 


We now consider the effect of taking two terms of the series for n = 20, 60 
and 100. For n = 20 the exact values are given by Hartley and Pearson [7], for 
60 and 100 values are given by K. Pearson [13]. Errors are as follows: 


n Mean St. Devn. By | Bo 
—0.0026 —0.0010 +0.0009 +0.006 
—0.0042 —0.002 +0.001 +0.02 

100 —0.0050 —0.002 +0.003 +0.02 


It will be seen that two terms of the series give a very good approximation to 
the true distribution. 

6. The first quasi-range. The following figures illustrate the effect of taking 
into account successive terms of the series. They refer to the mean value of w, 
when n = 30. 


SON ae eT Oe a any eee ee Ae ras Saas 3.23120 
Error of first term... . Be no ee +0.00185 
Error of first two terms... .. et ies ees me — 0.00025 
Error of first three terms...... 34 : iret +0.00006 
Error of first four terms ata s — 0.00002 


It appears that the use of three terms gives a high degree of accuracy, at least 
for the mean value. 

In order to examine the accuracy of higher moments and percentage points, 
exact values of the p.d.f. were found by quadrature for n = 30. The behaviour 
of the approximation, using three terms of the series, is shown by the following 
figures. 


Exact Value Error 


+0.00004 
By tata 0.0865 +0.0001 
Be. soto acaraaiass 3.142 <0.0005 
95 % point er ee ee 4.107 +0.001 
99.9% point............ 5.021 +0.001 


Variance. . 7 erin 0.25879 
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Below the 95 per cent point errors in percentage points are always less than 
0.001. 


For n = 10, Godwin’s five place value for the variance agreed exactly with 
that obtained when using three terms of the series. 


7. Efficiency values. The efficiency is defined as: 


var W, 
[E(w,)}? 


where s is the unbiassed estimate of population standard deviation: 


100 var s 


8 


n— 1 
ae 
ie Z(z; — z)’. 

2r (2) 

Vv : 
For range, values of efficiency were obtained from the tables of reference [13}. 
For w; and w, , Godwin’s values were available up to sample size 10. Beyond 


this, values were computed from the approximate expressions. Some values, 
rounded to the nearest 0.5 per cent, are given below. 


10 ; 40 60 


efficiency of we oh ; 54.0 44. 
Efficiency of w, i 66.0 59. 
Efficiency of we : 69.0 65.i 


100 





EFFICIENCY (°%) 


10 20 30 
SAMPLE SIZE 


Fig. 1. Efficiencies of the various estimators considered 
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The general behaviour is illustrated by Fig. 1. It appears that, as far as efficiency 
is concerned, range is best up to samples of 17. Then the first quasi-range is 
optimum up to n = 31. It seems likely that w; becomes optimum round about 
sample size 50. In any case, the difference between w2 and w; will probably be 
small over quite a wide range of sample sizes. 


8. Table of values for first quasi-range. The p.d.f. was computed, using three 
terms of the series, for even values of n from 10 to 20, and also for 25 and 30. 
Values of moment constants and percentage points were found for these sample 


sizes. Values for other sizes in the range 10 to 30 were then found by interpola- 
tion. As a check, mean values for each sample size were evaluated using (6). 

Throughout this work, as in all the earlier quadratures, ordinates were found 
at intervals of 0.2 in w, . This spacing was close enough to ensure that the dif- 
ference of odd and even ordinate sums was of the order of one part in a hundred 
thousand. This proved a valuable check on the computation of p.d.f. values. 
It was found to fail only for sample sizes below 10. In such cases there is no 
longer a high enough degree of contact between the p.df. and the axis at the 
origin to secure this balancing of odd and even sums. 


TABLE I 
Constants for first quasi-range 
Moments Percentage Points 


Mean Var 8: 


95.0 


0027 3423 145 
1238 3362 131 
.2315 3300 120 
5282 3240 112 
4158 3183 106 


yh 


bo bh te bo 


bo 
wt or or or w 


wo — 


.4959 3128 102 
5695 3076 O98 
6376 . 3028 096 
7008 2982 094 


7599 .2938 092 


1 or 
i to 


bo bo bo b&b bo 
~ 


$152 2898 091 
8672 . 2859 .090 
.9163 . 2823 .O89 
9627 2788 OSS 
0067 . 2754 .O8S 


QW bh bo bo to 
bo hw hw hw 
oN wm oe 


yw 


0486 ; .OS7 
.O885 i .O87 
1265 0.087 
.1629 : 0.087 
.1978 : 0.087 
. 2312 ; 0.086 


WOWWwWH NW 
ww Ww & = 


out, eS — 


to tw bw be bo bo 
tho bo bb bo bo 
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All values were computed to one further place than that given in Table I. 
Errors should not exceed one unit in the last place quoted, except for variance 
from 20 to 30. Here the wider interval of interpolation may result in errors up 
to 1.5 units in the last figure. 

We can find constants c, A, v so that cw; has approximately a x’ distribution 
with v degrees of freedom. Errors do not. vary greatly over the range n = 10 
to n = 30. Thus we find for n = 30, 


c = 16.231 A = 1.1409 v = 62. 


In this case the maximum error in the probability integral deduced from that 
of x’ is 0.0004, occurring near the 20 per cent point; errors are smaller at the 
tails. From the 5 per cent point to the 97.5 per cent point errors in percentage 
points of w,; , deduced from the corresponding values for x’, are always less than 
0.001. The error at the 0.1 per cent point is 0.008, while that at the 99.9 per 
cent point is 0.005. 

This type of transformation enables Bartlett’s test to be used as an approxi- 
mate test of homogeneity of a set of values of w, , each for the same sample size. 
It has been found to give results of similar accuracy for range and average range. 
It is hoped that a detailed study of this transformation will be completed and 
published shortly. 


9. Linear combinations of quasi-ranges. Godwin [5] determines the optimum 
linear combination of wo, wi, We, --+ , for the estimation of standard devia- 
tion. Such an estimator uses all the possible quasi-ranges, and for n = 10 gives 
an efficiency of 99.0 per cent 

For rapid estimation, attention must be restricted to a few values of w 
Mosteller [10] considers certain unweighted sums of two values. His investiga- 
tion is restricted to large samples, where the w, are replaced by inter-quantile 
distances. Nair [11] considers the sum of the first / quasi-ranges for sample 
sizes up to 10, while Jones [9] investigates it in the large sample case. 

Using the methods of Section 3, we can derive approximations tothe covariances 
of pairs of w, values. Thus, for wo and w; we find that the quadruple integral for 
E(wow)) is approximately equal to: 


D 
K / ure'(h — &)(20)" *k dw, 


— 4)9’. 


Here, all Greek letters are functions of }w,. When n = 10, comparison with 
Godwin’s values [4] shows that the error in the covariance is 0.0028. The use of 
another term of the series reduces this error to 0.0009 units. 
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In deriving these errors, the exact value of K given above was not used. In- 
stead, the integral was first evaluated with the factor w; omitted. This is an ap- 
proximation to E(w), and K can be found to satisfy this condition exactly. 
This device is analogous to the determination of C in (5) so as to make the ares 
under the p.d.f. equal to unity. As in the previous case it gives appreciably better 
results than does the use of the theoretically correct constant. - 

For w; and wz we have: 


(12) E(w, w2) = L | wee (k — &)*[y — 3h(we)](2b)"~ k dw». 


where 
(13) 


and 


, ” -.73 , 9 ; 
(14) ; ; — a Ww + +2 + (y = ¥ hws) | 


v — th(w.) 

All functions without an argument shown are to be taken with argument 
}w.. Comparison with Godwin’s value shows that this expression yields 
cov(w; ,we) with an error of 0.0005 units, when n = 10. 

The two estimators: 


Wy + AU, nal (b) wi, + AW2 
a an errs 
E(wo) + E(w) E(w) + E(w.) 
were considered, the constant \ being chosen to maximise efficiency. Results are 


shown by the dotted curves in the figure. Values, expressed to the nearest 0.5 
per cent, are given below. 


(2) 


(a) 
Efficiency... .. ..| 96.5 87.0 
| 0.85 1.47 


Efficiency or ee 79.5 


0.36 | 0.81 : 1.40 


As is to be expected, the value of \ used is not at all critical. Thus, a convenient 
integer can be used with little loss of efficiency. For instance, using the estimator 
(a) withA = 2whenn = 40, results ina loss of 1.1 per cent. 
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It is evident from the figure that the use of both wy and w, offers an appreciable 
advantage for sample sizes from 10 to 30. The combination of w; and we is not 
so impressive. It seems likely that w. used with a higher order quasi-range might 
be better from 30 to 60. 

The p.d.f. of such estimates will be a trivariate integral. While the methods 
used above allow this to be replaced approximately by a single integral, the 
labour of evaluation for a set of values would still be considerable. However, for 
some purposes, confidence limits based on a normal approximation will be satis- 
factory. 

Alternatively, the approximate evaluation of 8; is possible. Thus, if the analogy 
with w) and w; can be relied upon, an approximation based on a power of x’ 
should give a degree of accuracy sufficient for most purposes. 

I should like to thank Mr. D. F. Mills, who carried out the con.putations 
necessary for the tabled values of constants for first quasi-range. 
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SOME NOTES ON THE APPLICATION OF SEQUENTIAL 
METHODS IN THE ANALYSIS OF VARIANCE! 


By N. L. JoHnson 
University of North Carolina and University College, London 


Summary. Sequential tests of linear hypotheses in the systematic linear model 
are studied. Methods of overcoming difficulties in the construction of tests when 
there is a random model are considered. 


1. Introduction. The original methods |1] of constructing sequential tests re- 


quired problems to be formulated as discrimination between two simple hy- 
potheses. In cases where composite hypotheses were involved, a more or less 
arbitrary weighting function was introduced so that the problem could in effect 
be reduced to discrimination between simple hypotheses. Recent work by 


Barnard [2] and Cox [3] has made it possible to extend the sphere of application 
of sequential tests to a number of cases where composite hypotheses are to be 
compared. Barnard refers to unpublished work by C. M. Stein on this problem 
and there is a remark in [4] which implies that both Stein and M. A. Girshick 
approach the problem from the same angle as Cox. 

It is the purpose of these notes to discuss some points of detail arising in the 
application of sequential methods to the particular type of composite hypotheses 
associated with the analysis of variance. Tests of the general linear hypothesis 
in systematic (parametric) models will be discussed first, followed by a discus- 
sion of component of variance models for simple special cases. 


2. The general linear hypothesis. It will be helpful to start with a brief resumé 
of the general linear hypothesis and its likelihood ratio test in the case of samples 
of fixed size [5], [6}. 

It is assumed that 


(i) x = (a, ,---+ , ay) are N independent normal variables, 
(un) Sx) = OC’, 


where 


Received 11/29/52. 
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where C is partitioned between the (s — q)th and (s — q + 1)st columns, as 
is 6 

(ii) vara; = o (i = 1,--- ,N), 

(iv) The @’s and o° are unknown parameters; the c’s are known constants, 
defined by the design of the experiments. 

The hypothesis to be tested is Ho:@2, = (0,0, --- ,0) = 0. A likelihood ratio 
criterion for this problem is provided by any monotonic function of G = S)/Sz , 
where 


S, is the minimum of (x — 6C’)(x — 6C’)’ with respect to 6 


S, + S, is the minimum of (x — @yC’q))(x — 6 C’q))’ with respect to 6, . 
It can be shown [4], [7] that the ene density function of G satisfies 


(1) p(G | Oya") = &pG | OMAN — 8 + g), 3¢; AGO + G)) 


where 


OC’ (I — Ca (C’ mC)” co a)C 02 


and 


ate r(iy)r(x a 
MOQ . 
1(X, Y;u) = a +7) 


is a confluent hypergeometric function. 


3. Sequential analysis for the systematic model. Now consider a sequential 
form of experiment in which successive 2’s, or sets of «’s, are measured until a 
decision is reached. As the experiment is continued, so will C grow by the addi- 
tion of further rows. The way in which it is decided to obtain each successive 
observation (or set of observations) will determine the numerical values of the 
c’s in the successive rows of C. In these notes only those cases where the design 
is predetermined (i.e., where the c’s are not random variables depending on the 
results of earlier measurements) will be considered, although it would appear, 
intuitively, that determination of the c’s on the basis of results already observed 
would lead to improved procedures. 

At each stage in the experiment a value G'“! may be calculated. (The super- 
script [NV] means “pertaining to samp!e size N,”’ and will be omitted when con- 
fusion is not likely to be incurred by such omission.) The distributions of the 
corresponding random variables will depend o 6.0 | through the parameters 
\*1_ It might be hoped to use the sequence G'™? in a test to discriminate between 
the hypotheses 


H 5:00" QJ | 


> =). 


(Evidently by taking A; = 0 and choosing A: suitably a sequential test dis- 
criminating between H, and Hz could be compared with the likelihood ratio 
test of Ho.) 
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Such a sequential test could certainly be obtained if 
(2) p(G", --- , GY) = pe™ | emo fe", --- , 4”) 


where f(G™, --- , G'’!) does not depend on 0.0 *. Cox [3] gives conditions under 
which (2) is true. 

If it is possible to pick out a subsequence G'*" such that each of the terms in 
the corresponding subsequence \‘*" depend only on the same scalar function of 
62.0 then it can be shown that Cox’s conditions apply, and so 


p(G, --- 4" | dao") = p(G" | eae fa", --- , a"). 


, 


x: IN») ly [Ns A: 2 oe ae : 

Since p(G'*" | @.)07') depends only on \'*"! which is itself required to depend 
. ° =—1l\ » | e 

only on some scalar function ¢(@.)0  ) of @.)0 we may write 


p(G*), ... G1 9) = p(G" | a) f(a, --- 


It follows that 

(a) all hypotheses (about 6.0°') which specify the same value for ¢ will be, 
for present purposes, equivalent, 

(b) a sequential test for discriminating between the hypotheses 
” 


H':¢ = ¢' and H”:¢ = @ 


will be specified by instructions of the form 
(1) “Accept H’ if 


. DIG!» | 
Accept H” if Sore — 2 —. 
pia” | ¢’) a 


Otherwise take a further set of V4: — N, observatio’s in 
accordance with the prescribed pattern’”’ 
provided this sequence of operations terminates with probability one when either 
H’ or H” is true. 
The prescribed pattern will, of course, be such that A'*" depends only on 
$(O.2)0°). 


4. Limiting form of the test. Decision to take a further set of observations will 
be made if 


3 (G | ¢” — | 
< PG | ¢") 3 
l—a p(G d ) a 


where we have, for convenience, dropped the superscript [N]. From (1) this is 
equivalent to 


A + 3(\” — 0’) < log MGW, — 8 + 9), 495 3d’Y’) 
— log M(A(N, — s +g), $93; 3Ny’) < B+ 40” — X) 
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where 
A = log oe: B = log ae e. y= Ga + G)*. 


= @ a 

The application of tests of this type (for g > 1) requires tables of M(X, Y; u) 
rather more extensive than are known to the author at present. We can, however, 
make certain deductions about inequalities (2), incidentally showing that in 
certain important cases, the sequence does, indeed, terminate with probability 
one when either H’ or H” is true. 

It will be assumed from now on that ¢” > ¢’ and \” > X’. As u increases from 
9 to l, 

flog M(X, Y; 4d”u) — log MCX, Y; 4Nu)] 
increases from 0 to 
flog W(X, Y; 4\”) — log MCX, Y; 3)’)] 

(if XY > Oand Y > O). Hence (2) is equivalent to 
(3) G(N» , a, B, ’,”) < G < GN , a, B, d’, X”) 
G and G being fixed numbers defined by the quantities in brackets. (N.B. If A + 
(\” — dX’) > O then G = O, while if B + 3(\” — 2X’) 2 log M(X, Y;3 X”) 
log M(X, Y; 3)’) then G = ~). 

Now consider the special case where sutcessive sets of k observations are taken, 
so that NV, = kn, and each set is arranged in the same pattern, so that identical 
sets of rows are added to C at each stage. In this case it is possible to take @ = 


\")/k and then \'%*"! = \"“" = nkg is a function of ¢ only. (6 may be thought 
of as “noncentrality per unit observation’). (2) now becomes 


A + 4nk(o” — ¢’) < log M(A(nk — s + q), 4q; 4nko’y’) 

(4) — log M(3(nk — s+ q), 3q3 nko’ y’) 
< B+ ink” — ¢’). 
Now Perron [8] has shown that 
M(X, Y;u) = PO) (xw evra + O(X)). 
-VT 

Hence from (4) if n is large and ¢’ ¥ 0, we have 

A + Ank(o” — 6’) + O(n”) < 3g — 1) log ('/6”) + Anko” — ¢’)y° 


(5) aie icniogs ee 
+ nlkik—(s — qg)n"'})'(We” — Vo')y < B + 4nk(o” — $6’) + O(n). 


i 


\ 


Remembering that ¢” > ¢’, (5) may be rearranged to give 

+ 
Vi + Ve 
(g — 1) log(¢”/¢’) + 4B 


) YY <2 ———— ht O(n™). 
nk(o” — ¢’) 


os. (q - I) log(o” ¢') + 4A + O(n ) < y" 4. 


(6) 


om <¢ 


a 
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This implies 
} 


” i / -y —2< O(n"). 
Vo +tV 


+ 


As n tends to infinity, y tends to ¢’(1 + ¢) * and y tends to (1 + ¢) ’ al- 
most certainly. Unless 


b — . , ae 
l+o Vo" + Ve’ V(1+4+ 4) 


for either ¢ = 9’ or @ = ¢” it is clear that the sequence (1) must terminate with 
probability one in either case. It is of interest to note that large samples are not 
likely to be reached unless (7) is satisfied, at any rate approximately. From (7) 


8) g(1 + 6)? = All + Ve" +Vo0)"}? -— U(V 9" + Vo)". 
We may note that if both @” and ¢’ are small (8) gives Vo = 3( Vo" ++/¢’) 
which is not unexpected on intuitive grounds. 

The argument above applies to the case ¢’ ~ 0. If g’ = 0 then instead of (5) 
we obtain 

A + 4nko” + O(n’) < (q — 1) log ko” + nko” yy 
+ nk{1 —(s — q)/nk}'y oy < B+ jnko” + O(n ‘) 

leading finally to (7) with ¢’ = 0. 


5. One-way classification. The special case of one-way classification into k 
groups will now be considered. At each stage we decide whether or not to take 
a further set of k observations, one from each group. The usual systematie model 


(9) Hi = athe + ei -skji = 1,-+-,n; Db = 0) 
t 


is ineluded in the general linear model described in Section 2. Then, in the nota- 
tion of Section 3, V, = nk, s = k and the hypothesis bk = 0 (t = 1, --- ,k) 
implies g = k — 1. Further 


G™ ond (% — 27/>, > (tu - 
t t : 
™) 2 nD b/o’, d = dD bi/ko’. 
t t 


Hence the sequence G°", G°"! 
discrimination between HH’: 
structed as in (I). 


- , may be used in a sequential test based on 


>i ko” = o' and H’: 3b; ko” = $” and con- 


6. Random model in the one-way classification. It is well known that some- 
times a model different from (9) may be used. This is 


(10) Le = at ue Ht 24; 
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where the w’s are normal variables, each with expected value zero and standard 
deviation og , mutually independent and also independent of the z’s. This model 
is often called the “random” or “component of variance” model. In this case 


(11) (gm! | 3) = — at nayre . ec 
i BG(k — 1), $k(n — 1)) A + nd 4 Ghtliten-t 


where 6 = ox/o. This depends only on 6, which plays the part played by ¢ in 


the systematic model. Cox’s result shows that the sequence G°", G°", --- , may 


‘ . ~~ ° Pa a 

be used in a sequential test based on discrimination between Hg:6 = 6’ and 
” e . ra . . “a 

Hp:6 = 6” provided the procedure specified terminates with probability one. 
The procedure will be 





(II) “Accept Hi, if f(G, 6’, 6”) = B/C — a). 
Accept H?, if {(G, 6’, 6”) 2 (1 — B)/a. 


Otherwise measure one further item in each of the k groups’”’ 
where 


on” tk n—l as ‘ }(kn—1) 
; 1+ 
SG, #, ww Baer 3) ( Be _~ + <) 
l no + G 


Under this procedure sampling will be continued whenever Gg < G < Ge 
where 


’ ' on” 1 5 ow - ¢ -\~1 
(12) Ge =n(o"e — 6’)(L — €)  — 1; Ge = n(6"@ — 8’)(L — @ 


— 1, 


2/(kn-1 er \ k(n—-1) /(kn-1) 
3 1 + né 
so — Se ’ 
l—a 1 + 76 


i ip 3 2/(kn—-1) ‘4 + na’ k(n—1) /(kn—1) 
wn . a (5) 


It can easily be shown that, if 6” > 6’ > 0 


lim Ge = [2A + (k — 1) log (8”/8')|/k(6"* — 8”), 


mo 


lim G. = [2B + (k — 1) log (5” é’)] ke’ - 5”) 


ior 


so that lim,.,Ge # lim,n,Gr. 


On the other hand if 6° = O (the case considered by Cox) lim, 


G 
lim,.,.G'x = (). 


~~ SR 


The random model can be regarded as a mixture of systematic models in which 
the quantity obi ko’ = @ is distributed as 6-(xi1/k). For any systematic 
model such that lim,,@e < ¢ < lim,.,,@x there is a nonzero probability that 
Gp < G'™! < Gy for all n. If 6 # 0 there will be a nonzero proportion of syste- 
matic models, in the mixture constituting the random model, for which this is 
the case. Hence, the sequential procedure outlined above for 6” > 6 > 0 will 
not conclude with probability one unless 6 = 0. 





620 N. L. JOHNSON 


Now consider the test based on 6” > 6’ = 0. This will terminate with prob- 
ability one both when 6 = 6” and when 6 = 0, and so can be used as a sequential 
test. Some values of Gg and Gz for this case are given in Tables Ia and Ib. As 
might be expected the procedure is not very practicable for small values of k. 
For example, in neither of the cases covered by the tables is it possible to obtain 


TABLE Ia 
&’ = 0, a= 8= 


(i) Values of Gr 


30 60 


146 1.414 

744 - 855 

569 §36 

. 469 .517 

.403 .439 

-3°5 385 

208 | -2e | 319 344 
mat | see 1 « ; - 290 .312 
‘am |} a . 267 287 
-190 .197 .21 od . 248 . 265 
once | 28 | - 232 . 248 
. 153 { j 195 . 207 
125 3 13 ; . 156 165 

.094 096 tt ‘ -114 .120 
056 «.057~—xj ‘ 066 -069 


Values of Gr 


10 5 30 0 


_— 2 , . 5.542 ! : 2. 2.435 2.049 1. 
2.909 B 2. 73 j . ‘ 279 1.169 1.068 
-479 3 u ; 7 . : r ‘ -806  .755 

014 i ‘ ‘ .793 76 a .73 : é -629  .597 
783 ‘ d 635. : ‘ ‘ . -522 | .499 

. 640 ; : 53 4 ; -450  .433 
.545 ‘ 76 Pts b J .397 | .384 
.477 d é . ‘ ‘ : .357 346 
.424 ; 7 i 3 ‘ j .325 | .316 


Psoaw & & te 


- Oo © 


. 383 3 3 7 33% 3 . . .299 | .291 
350 335 3 ‘ 3 ; 28 : “ -277 270 
. 280 2 262 |. 25 . ° “ ‘ -229 | .224 
212 . . ° . ° . -180 177 
- 146 ; ° 135.18 ‘ ° 129.127 


079 ‘ . ‘ e ‘ ‘ ‘ ‘ J .073 = .073 


rw 


ow 


a decision in favor of H’(6 = 0) with n S 60 if k = 2. In such cases it will prob- 
ably be a good scheme to curtail testing at some convenient value of n. 


7. Alternative procedures. An alternative method of procedure is to keep n 
constant and to decide, at each stage, whether to choose another group at random 
and take a sample of n from it. This method, which may sometimes be practicable, 
has the advantage that it has in general a probability of one of concluding even 
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when based on a nonzero value for 4’. Since the ratio p(@ | 6”)/p(G@ | 8’) will 
have the same mathematical expression as in (II) it follows that the appropriate 
values of Gy , Gz are those given by (12). Further, in the case when 6’ = 0 tables 
such as Table I may be used in carrying out the test, proceeding along the rows 
of the table (i.e., increasing /) instead of down the columns (i.e., increasing n). 
By analogy with the case of samples of fixed size it would be expected that this 


TABLE Ib 
vo=0, a= f= 001 
(i) Values of Gr 


~ 9 10 il 


-- _ .057 

. 030 104 167 

O88 .143 189 
.110 153. 188 218 
018 117 152 is] 206 
.034 -119 -.149 178 193 
O44 118 143 164 182 
001 050 115 138 .156 171 
. 009 054 112 | .132 ; .149 | .162 
O15 056 —=(« 109 .127 142 154 
020.057 -106  .122 136 147 
. 028 058 096 109 =. 120 129 
032 .055 .07 O84 093 .101 108 
032 | .O47 | . 066 .073  .078 .082 
.025 .033 042 046 048 050 


(ii) Values of Gr 


9 15 2 30 60 

- - - — 47.044 2.533 { 5.999 3.032 2.275 
_ — 14.911 6.572 4.459 3 2.952 | 2.599 | 2.3 2.17 d 757 .314 134 
058 5.470 3.150 342 1.934 524 1.407 2: 5 083 .881 | .791 
7.645 2.548 | 1.792 457 1.263 1.052 98s ; O81 | .7 .678 =. 620 
2. S87 675 1.271 1.070 .950 S14 766 73 : ‘ 644 -558 | .517 
G64 255 990 852 767 7 669 . 638 .§13 on oF 543 |. 478 446 
1.490 1.007 -814 qi | .667 |. 571 546 $12 |. 472 420.395 
203 843 694 .612 561 .500 480 ‘ 5: 419 .376 355 
009 | .727 .606 .539 .496 . 446 429 ‘ 406 3 .378 .342 324 
870 =.640 .539 482.446. .403 389 a 3 345 32 314 298 
765 | .572 -486 | .437  .406 é 35S 356 ; 336 333 .318 3 290 277 


até 


LAP eB WH 


564 436 .377 | .344 322 295 286 278 a . 269 259 239 229 
394.315 278 | .256  .242 | .23 .224 219 2 207 201 . 187 181 
.249 .208 185 | .173 | .168 | . 155 152 . 149 145.141 137 133 130 
. 120 104 -096 .091 OSS 084 .083 . O82 00 O78 077 075 074 ~=«.072 


latter procedure should give, in general, a lower average sample number (of indi- 
viduals) than the method based on increasing n. 

It is interesting to note [9] that, while it is impossible to construct a fixed 
size sample test using G as criterion if 6’ + 0 unless k& exceeds a certain minimum 
value depending on a, 8 and 6”/6’, for a sufficiently large k a fixed sample size 
test is available. This suggests that there should be sequential tests, having the 
required properties, available for a somewhat wider range of values of k. 
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8. Further alternative procedures. Owing to the method of construction of our 
tests it is not possible to use Wald’s approximate formula for average sampling 
number, which applies to a sequence of identically distributed and independent 
random variables. It, therefore, seems worthwhile to note that the hypotheses 
discussed in earlier sections can be subjected to sequential tests of standard 
type based on sequences of independent random variables. This was pointed 
out to me by A. G. Baker, but the idea has also been used by O. J. Car- 
penter [10]. 

In the systematic case such a test can be constructed as follows. Successive 
samples of size m(22) are taken from each of the k groups. A value, g, of G is 
calculated separately for each such sample of size km and the sequence of inde- 
pendent g’s so obtained used in a sequential procedure constructed in the usual 
manner. Such a procedure will not, however, lead to independent g’s in the ran- 
dom model. If a new set of k groups is chosen at random each time, and an ob- 
servation taken from each, then the successive g’s will be independent. If we 
denote the ith value obtained by g; the procedure will be 

(111) “‘After the nth set of km observations; 

Accept H':6 = 6’ if 

> ee 1+ mi’ + gi < 2A nk(m — 1) 
=I 1+ mi” +9; ~ km—1 km — 1 
Accept H”:5 = 8” if 


ae 1+ ms’ 
1+ mi” * 


> ie 1 + md’ + 9g; - 2B nk(m — 1) lo 1 + ma’ 
y > a og - = 
e 1+ ms” +9; ~ km — 1 km — 1 ej + ms” 


Otherwise take a further sample of n from each group.” 

In this case, using Wald’s approximate formula for average samp!e number, 
the approximate expected number of individuals to be observed is 
mk{A(1 — a) + Ba}/E,, (6) if H’ is true, mk{ BCL — 8) + AB}/En.(6”) if H” 
is true where 





|. sl 1 + mé” 
16 | = 3k(m — 1) log ——— 
1 + mé 


1+ més’ +a). ] 
+ 3(km — 1)& | tog pole 6}. 
c i+ mm” + gi 
The leading terms of a useful approximate expression for E,,,.(6), (obtained 
by the method of statistical differentials) are 


s 1 + mé” 


E.nx(6) = 4k(m — 1) log —— 
F as + mé’ 


+ 3(km — 1) oe (R’/R”) 


m(m — 1)k(k — 1) s*)° 6 ey 


‘* {@-8")’ _ (6 - 6) 
km + 1 | RR" | ne 
a Sm*(m — 1)k(k — 1)(km — 2k + 1) [(8 — 8”) 


t 
< 


3(km + 1)(km + 3) — pr 


S 
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where R’ = km — 1 + m(k — 1)6 + m(m — 1)k6’, R°= km — 14+ m(k — 1) 8 


+ m(m — 1)ké 
Some calculations in the case 6’ = 0, 6” = 1 indicate that, when 6 = 6’ = 0, 


the average sampling number will be minimized by taking m = 4. 
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THE FIRST PASSAGE PROBLEM FOR A CONTINUOUS 
MARKOV PROCESS' 


By D. A. DaruinG ann A. J. F. SreGERT 
Columbia University, University of Michigan, and Northwestern University 


Summary. We give in this paper the soluticn to the first passage problem for 
a strongly continuous temporally homogeneous Markov process X(t). If 7 = 
T.»(x) is a random variable giving the time of first passage of X(t) from the 
region a > X(t) > b when a > X(O) = x > b, we develop simple methods of 
getting the distribution of 7 (at least in terms of a Laplace transform). From 
the distribution of 7 the distribution of the maximum of X(t) and the range of 
X(t) are deduced. These results yield, in an asymptotic form, solutions to 
certain statistical problems in sequential analysis, nonparametric theory of 
“goodness of fit,’’ optional stopping, etc. which we treat as an illustration of the 
theory. 


1. Introduction. There are certain generalizations of the classical gambler’s 
ruin problem which appear in various guises in numerous applications—besides 
statistical problems there are physical applications in the theory of noise, in 
genetics, etc. The exact solution of the associated random walk (or Markov 
chain) problem is often analytically difficult, if not impossible to obtain, and 
one is usually content with asymptotic solutions. The nature of the asymptotic 


solution is generally such that it is the solution to a Markov chain problem 
in which the length of the steps, and the interval between them, approach zero 
and which may in the limit be regarded as some sort of continuous stochastic 
process. 


This circumstance suggests we might solve directly the associated problem 
with regard to the stochastic process and so obtain the asymptotic solution to 
the Markov chain problem without the intervention of a limiting process. 
Aside from the difficulty of justifying the interchange of these limiting opera- 
tions, it turns out that this procedure is otten quite feasible and leads to simple 
solutions. Using this idea Doob [7] obtained in a direct way the Kolmogorov- 
Smirnov limit theorems and the principle was further exploited by Anderson 
and Darling [1]. The general principle is, of course, quite old, and in connection 
with random walk problems goes back at least to Rayleigh. 

A general feature of this method is that the analytical difficulties, if any, are 
revealed as more or less classical boundary value problems, eigenvalue problems, 
ete.—this intrinsic nature of the problem often being masked by the discrete 
approach. On the other hand, it suffers from the serious defect of giving no 
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information as to the difference between the actual solution and the asymptotic 
one—information which is essential in the numerical applications. 

In the present paper we treat the first passage (or ruin, or absorption proba- 
bility) problem for a general class of Markov processes (cf. Section 2) and 
obtain the solution in the form of a Laplace transform (Section 3). This Laplace 
transform is generally given as a simple function of the solutions to an ordinary 
differential equation (Section 4). The methods used are similar to those used 
in the discrete theory by Wald [17] (fundamental identity) and Feller [9] (re- 
newal and generating function techniques), but the analysis is considerably 
simplified, at least in a formal way, and not restricted to additive processes. 
It turns out that there is an intimate relationship between the one- and two- 
sided absorption probabilities, and the probability of eventual absorption in 
one of the boundaries. 

We illustrate the theory in Section 5 by solving a problem of Wald [17] in 
the sequential test of the mean of a normal population against a single alternative, 
the derivation of a nonparametric test used by Anderson and Darling [1] and 
the solution to the optional stopping problem (Robbins [15]). These problems 
are treated by solving the associated absorption problem with the Wiener- 
Einstein process and the Uhlenbeck process. 

In Section 6 we study the first passage moments which can be obtained by an 
expansion of the Laplace transforms or again through differential equations 
which can be explicitly solved in quadratures. There are some quite interesting 
relations between the moments.| 

In Section 7 we develop the distribution of the range which has been used by 
Feller [10] in a statistical study. 


2. Definitions, notations, assumptions, etc. Given a stochastic process X (i) 
with X(0) = z,a > x > b, we define the first passage time T(x) as the random 
variable 

T = Ta(x) = sup {t|a > X(r) > 0,0 S 7 S ft}. 


We make the following assumptions about the stochastic process X(t). 
A) X(t) has a transition probability 


P(x | y, t) = Pr{X(¢ + 8) < y | X(8) = 2} e> 9%, 


Ss 


satisfying the Chapman-Kolmogorov equation 


P(x LY, fi, + t) = [ P(z ly, ts) d, P(x | 2 ti), t; a Q, t > QO; 


i] 


that is, X(t) is temporally homogeneous and stochastically definite (e.g. 
Markovian). 

B) X(¢) is continuous with probability one (or is strongly continuous). 

If X(t) satisfies A) sufficient conditions on P are known that it satisfy B), cf. 
Doeblin [5]. Fortet [11], Ito [12]. These conditions generally imply further that 
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P satisfies the diffusion equation of Section 4. Note that A) and B) imply the 
existence of the random variable 7, and we denote by F(x | ) the distribution 
function of 7, Fa(x | t) = Pr{Ta(x) < t}. 

In the work to follow we shall presume P and F have derivatives p, f; these 
being the densities 


: 0 a a 
p(x | y, t) = — Plxl y,t) 
dy 
' 
Fos(x | t) — Fa(x | t), 
ot 
the modification of the results if these conditions are not met being more or 
less immediate. The existence of a density for T has been established by Fortet 
[11] under some circumstances. In this fundamental paper of Fortet on absorption 
probabilities there is just one absorbing barrier, but the modification of his 
results for two barriers is easy. 
Ifa = +2 orb = —~= so that we have a one-sided absorption time we write 

T(x) as the corresponding random variable. That is 

\T...(z) ifz>c 

T .(x) i 
7 


(x)ifxr<c 


c,—20 


with a corresponding distribution function F.(x | t) and density f(z | ¢). 
It may happen of course that absorption is not a certain event and that T is 
not a proper random variable, that is Pr{T.(x) < «} = F.(x | «) < 1 (or 


similarly for 7(x)) and in this case we may still meaningfully treat the con- 
ditional density of 7, under the condition T < «. 

We need, in addition, the conditional distribution of 7.,(7) under the con- 
dition that the absorption takes place into the barrier a, which we denote by 
Fix | 0): 


Foo(x | t) = Pr{Ta(x) < t, Ta(x) = T.(x)} 
and F,,(x | ¢) will denote a similar expression for the lower barrier b. Hence 
Fa(x | t) = Fo(x | t) + Fala | t) 


and the corresponding densities are f,(x | t) and fa(zx | t). 
We denote by a circumflex over the corresponding function its Laplace 
transform on ¢; for example 


p(x|y,rA) = | e n(x | y, t) dt, 
Fao(z | d) | c7 a(x | t) dt, 


“0 


etc. The continuity of the process X(¢) ensures the existence of these transforms. 
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3. The distribution of 7. In this section we obtain the distribution of 7’ in 
terms of the transition density p of the process. Theorem 3.1 for the one-sided 
barrier is due to Siegert [16] essentially. 

THEOREM 3.1. If X(t) satisfies conditions A) and B), then p(x | y, d) is a 
product 


Zak (u(x)a(y), y>x 
pie|y,rA =) 
\v(x)ri(y), y<o2 


(ula) £<¢€ 

u(c) 

(3.1) f(x|r) = 
| v(x) 


. 2 € 
u(c) ’ 


We note that absorption may be uncertain and f,(z | 0) = Pr{T(xz) < «} 
may be less than 1. A necessary and sufficient condition that absorption be 
certain is that f.(a | 0) = 1. 

To prove the theorem we use a renewal principle which is very old. We have 
by A) and B) fory >c>-2z 


t 
p(t |y,t) = | fa |r)ple|y,¢ — 7) dr 


by a direct enumeration of the paths going from x to y. On taking Laplace 
transforms we obtain 


p(x | y, A) = f(x | A)p(C | y, d), y>c>2z 


and thus p(x | y, A) is a function of x times a function of y, say u(x)w(y) and 
hence for y > c > a we get f-(x | A) = u(x)/u(c). Similarly, for y < ¢ < x we 
obtain f-(x | 4) = v(x)/v(c) and hence for any c, x we obtain the conclusions to 
the theorem. Finally it follows by cancelling any factor which depends only on 
\ that w(x) and v(x) are uniquely determined. 

THEOREM 3.2. Let X(t) satisfy A) and B) and let the functions u(x) and v(x) 
be as in Theorem 3.1. Then 


a (b)ulxa) — wlb)v(x) 
(3.2) iti. 1h) = wlO)via) 
i u(a)v(b) — u(b)v(a) 
“ wlaje(xr) — vla)u(r) 
f(t 1A) = 7 —— 
; u(a)v(b) — u(b)v(a) 


ABT v(x)(ula) — ul(b)) — ulx)(v(a) — v(b)) 
az¢|\) = ——— 
- u(a)v(b) — ulb)v(a) 





628 D. A. DARLING AND A. J. F. SIEGERT 


To prove the theorem we consider the two expressions 


t 
fale |t) = f(x |t) + [| fale | rfalb|t — 1) dr 
Jo 


t 


f(x |t) = fale! t) + | fala | r)fla|t — 1) dr 


which are established by a direct enumeration. Considering f~ and f” as un- 
known this pair of simultaneous integral equations is solved immediately by 
taking Laplace transforms 


(3.5) filx | d) = fe |r) + fale | Dfa(b | A) 
(3.6) folx |r) = fala |r) + f(a | f(a | d) 


which are 2 linear equations in 2 unknowns. On using the expressions in Theorem 
3.1 for f, and f, we get (3.2) and (3.3) for fj and fz and the last expression 
(3.4) is obtained by noting fs = fa + fa. 

A random variable closely related to 7 is the maximum of X(t), and we 
define 


(3.7) M(x,t) — sup | X(z)], X(0) =z. 


t 


Denoting the distribution of W by G(x | &, t) we have clearly 
(3.8) G(x | §&,t) = Pr{ M(a,t) < &} = Pr{T:-2(x) > 2} 
=l1- F: +(x t), > | xz 


so that the distribution of W is given directly through that of JT. On taking 
Laplace transforms of (3.8) we obtain the following corollary 

Corouuary 3.3. G(x | —&,X) = 1/1 — fe,-z(a | A)) for fe_2(x | X) as in Theorem 
3.2. 

For a symmetrical process there is a specially simple formula for the Laplace 
transform of T,,-4(x). A process X(t) issymmetrical if p(x | y,t) = p(—«| —y,0) 
for all x, y, t. In this case u(x) = v(—x) and Theorem 3.2 yields the following 
corollary. 

Coro.uaryY 3.4. For a symme trical process 
- — u(x) + u(—2) 

(3.9) $ ule | 8) © Bebe, a| <a. 
u(a) + u(—a) 

4. A differential equation. The function p(x , y, t) will in most cases of interest 
satisfy the so-called diffusion equation 
+ ) 4 5 a 
(4.1) cP = A(x) SP + 1B°%(x) SP 

ot Ox * Ox" 
with initial and boundary conditions p(@ | y,t) = p(— » | y,t) = 0, p(z| y,0) = 
5(x — y) (the Dirac function). Sufficient conditions on p, involving the infini- 
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tesimal transition moments, are known in order that p satisfy an equation of 
the type (4.1), and which generally further ensure the process is continuous with 
probability one (cf. Doeblin [5}). When A and B’ are given a priori, conditions 
on them are known which ensure that (4.1) has a unique solution which is the 
transition density of a process continuous with probability 1. (Cf. Fortet [11)). 
But general necessary and sufficient conditions are not known, and it does not 
appear to be known under what conditions a process continuous with probability 
one satisfies a diffusion equation. However, for specific processes these points 
are generally easy to resolve. 

The following theorem shows that for processes satisfying (4.1) u and vy can 
be determined from a differential equation. 

THEOREM 4.1. ff p(x | y, t) uniquely satisfies (4.1) with the stated boundary 
conditions and X(t) is continuous with probability one, the functions u(x) and v(x) 
can be chosen as any two linearly independent solutions of the differential equation 


o, \ dw dw 
(4.2) 4B’(x) — + A(z) — _w = 0. 
1x? dx 
To prove the theorem we note that if p satisfies (4.1) its Laplace transform 
satisfies 
. dp od p 
(4.3 \p=A—+4+}B — 
, peda de 
and indeed — > is the Green’s solution to this equation over the infinite interval 
(—» <a < om). As a consequence, if u(o) = v(—«) = O and u(x), v(x) 
satisfy (4.3) we obtain to a constant factor 


ate | y,x) = COU) yz 
v(y)u(x) yu 
so that we obtain the previous expression (3.1) for f.(x | 4) and consequently 
we obtain (3.2), (3.3) and (3.4). But since (3.2), (3.3) and (3.4) are invariant 
under any nonsingular linear transformaiion of u and y we obtain Theorem 4.1. 
As for (3.9) we can choose for u(x) any solution to (4.2) provided u(x) and 
u(—<x) are linearly independent. 
The customary way to obtain the first passage probability fa(x | ¢) is to 
solve (4.1) with the boundary conditions p(a | y, t) = p(b | y, t) = 0, 


a 


p(x | y, 0) = 6(a — y) and then we should have Fa(z | t) = 1 — p(x | y, t) dy 
vb 


(cf. Fortet [11] for a proof and Lévy [14] for a general discussion). By using the 
Laplace transform method this will give (3.4) for f(a | ), but it does not 
appear to give f* andf~. 

Since fj(x | 0) is the probability that absorption in the barrier @ occurs 
before absorption in b, we should expect that, putting \ = 0 in (4.2), the solution 
to 


ypilos 4 
dx? 


do mn 


a 0 
dx 
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with ¢(a) = 1, ¢(b) = O should give this probability. Khintchine [13] has proved 
this result directly from the limiting case of a Markov chain without the use of 
a stochastic process. Barnard [3] has considered this result in connection with a 
sequential analysis problem. 


5. A few examples. 
a) The Wiener-Einstein process. Here X(t) is the free Brownian motion; 
X(t) is Gaussian with mean 0 and covariance E(X(s)X(t)) = min (s, ¢) and its 


sa ;, aa laid a 1 dp ». 
transition density p satisfies the differential equation se =>s = (ie. A = 0, 
2 02" 


B’ = 1). Two linearly independent solutions to 3W” — AW = 0 are u(x) 
e V*2 and v(x) = e¥** = u(—z2z) and hence we obtain from (3.9) 


_ 
i . cosh VY 2\ x 
(5.1) Ja nihlt | d) 2 Se » j Zz | <. @. 
cosh 7/2} a 
The inversion of this Laplace transform is easy, and we obtain 
nn F ) 
° / wT . ° WL) ~—(j+4)2x2t/202 
fe.(z | #) = = >> (—1)*(j + 9) cos{(f + 8) —) OO 
a* j=0 \ a} 
and by integration on ¢ 
Py 2tt it) = PHT et) < aj 
2 <= (—1)’ 5 rx (f+) 292/202 
= 1 oi en cos < (j + 1) a I+ 3 ® 2a 
WT j—0 J + 2 ‘ a ] 


This completely solves the case of Brownian motion for general barriers, since 


Pe i . a+b 
(5.2) Foo (a | t) =F (e)/2,-ca-») 2 |  — a i}. 


This result is well known (Bachelier (2], Lévy [14]) and alternatively can be 


: i ; — 1a 
obtained by the method of images with the heat equation oP ==; —t 
o OX" 


b) The Uhlenbeck process. Here X(t) is stationary, Markovian, and Gaussian, 
with mean 0 and covariance E(X(s)X(t)) = €'*~" and the transition density 


satisfies (4.1) with B’ = 2, A = —z. Solutions to 


dw “ dw 
dx? dx 


are u(x) = e *D_\(x) and v(z) = & *D_,(—x) where D,(z) is the Weber 
function, (ef. Whittaker and Watson [18]). Hence (3.9) gives 


f 


} 


(5.3) cdl | r) = exp iz — i a 


| D_y(x) o D_y(— x) 
D_(a) + D_\(— a)’ 
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but it appears very difficult to invert this transform. For the particular case 
x = O this result (5.3) was obtained from a limiting case of an Ehrenfest urn 
scheme describing molecular equilibrium by Bellman and Harris [4]. 

c) A problem of Wald in sequential analysis. Let X,;, X2, --- , be independent 
random variables, normally distributed with an unknown mean @ and a known 
variance K~. That is, the density of XY; is 

1 
—(z—6)2/2K2 
(z,0) = —=>y; 
Ot, 7) V2" K 
According to the sequential likelihood ratio test of Wald, in order to test the 
hypothesis H» that 6 = # against the hypothesis H; : 9 = @, we consider random 
variables 


, o(X; , 01) 6: — A A: + 62 
Z; = lc ig seen => —~— Be = ee. 
Ok, a0 kK? ( 2 ) 


and let S; = Z; + Z2 + --: + Z;. Then fora > 0 > b we study the random 
variable V defined as the smallest integer for which either Sy > a or Sy < b 
and determine for this V the probabilities of these outcomes. 

Now 


— 6, — of 41 + 4 
(5.4) E(Z;) = ‘(9 - 4= *) =u, 
ty 2 
: a 0, — 02\ 
(5.5) Var (Z,) = (= ; ‘) =o 
kK 
so that this suggests we study a Gaussian process S(t) with independent incre- 
ments and with E(S(t)) = wt and Var (S(t)) = ot (a linear transformation of 
the Wiener process). Then the joint distribution of S;, S., --- , S; is the same 
as the joint distribution of S(1), S(2), --- , Sj), and in place of finding the 


distribution of N we approximate to it by finding the distribution of the ab- 
sorption time 7, ,(0) in connection with the process S(t). It should be remarked 
that the nature of this approximation is quite different from Wald’s approxima- 
tion of ‘neglecting the excess” since the process S(t) may leave and re-enter 
one of the barriers between two consecutive integer time instants. 

The differential equation satisfied by the transition density p of the process 
S(t) is 


Op Op oop 
= = — - 2 P ; 
ot Ox 2 02° 


that is, A = yw, B = o°, and (4.2) becomes 


o dw dw 


u 3 
2 dx? dx 


(5.6) 


— Aw = 0. 
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It is simple to solve this equation with constant coefficients and since the two 


‘e 2 
roots of 5 & + pé — y = Oare 


—p+ Vie + Qed = 2 + 2o*r 
a a =o, > 2 eas ee > 


l = 


two linearly independent solutions to (5.6) are u(x) = e** and v(x) = e** and 
hence by Theorem 3.2 we immediately obtain f*, f-, and f and the problem 
is formally solved. The expressions are to be considered for x = 0, and (3.2) 
gives for x = 0, with this u(x), v(x), 


b fob 


—e 


“ é 
f*(0 |) = 


ebzatiab sa gtiartad 
and at \ = 0 this gives the probability of being absorbed into the barrier 
before b, and we abbreviate L*(@) = f(0 | 0) for this probability. For \ = 
we have & = 2u/o’, & = Oso that 


— (2ub/o?) 
cw .. 4 


UE tar i cota: 
e— (2alo2)b a e~ (2Hle2)a 


According to the test of Wald we should choose the barriers a and b so that 
L*(@) = 1 — 6, L’(@) = a@ where a, 8, are given positive numbers with 
a+ 8 <1. For @ = 6 we have 2u = o’ and for 6 = 6; we have 2u = —o’ from 
(5.4) and (5.5). Hence from (5.7) we get as two equations for @ and b 


which are easily solved toe give 


a = log Re. b = log B — 


a — & 


These are the formulas of Wald. 


From (5.2) and (5.3) we see that 2u/o° (20 (0; + 62))/(@, — 62) which 
denote by h(@). Then setting A = (1 — 8)/a, B 8/(1 — a) we obtain from 
(5.7) 

; BC —1 


L'@) = pw aw’ 


the probability of absorption in the barrier a, which is the power of the test 
(i.e., the probability of rejecting H.: 6 = 6: when @ is the true mean) and 
1 — L*(6) = L (6) is the expression given by Wald for the operating charac- 
teristic of test. 
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For the distribution of 7’ (approximate number of observations necessary to 
terminate the test) we use the expression (3.4) with z = 0 to give 


E,b ob , £ G 
PP as oe”) a” ae e ) 





Ele Ter i fo(O| 2) _ 


erate) es ee?” +&\a 


which can be inverted to give a rather complicated expression : 
Fu(0|t) = Pr{Ta(0) < t} 


on < n(—1)” bio? - nra . mrb 

sete - . —_— e sin —— 

(a — 6)? aaa we on | a-—b 
— > 


252 * Qa — b)? 


ne 4 
cnr \ 
2(a ae b)?! } 


ue | 
* exp (-{ 2, + 
\ 20? 


But the moments are easy to obtain by expanding about \ = 0, since we have 
the moment generating function of 7 (note that JT is a proper random variable, 
that is, absorption is a certain event since f.s(0 | 0) = 1). An alternative way is 
to use the result of the next section which gives the moments as the solutions to 
differential equations. If we let m(x) = E(Ta(x)) then from (6.6) it follows 
that m satisfies the differential equation 30°m’’(x) + um'(x) = —1 with m(a) = 
m(b) = 0. 
Assuming first that » ~ 0 we obtain by solving this equation 


m(0O) = E(T) = — (aL (6) + bL(@)) 


while for u 


_ *) ws (? — s) K’ 
. e\ 3 / Ge) 


Here L’(@) = 1 — L (@) is the probability of absorption in the barrier b, as 
before, and a, b, u, o have their former significance. 

It is rather remarkable that despite the differing nature of the approximations 
of Wald and the approximations by presuming a continuous process as here, 
they should give the same formulas. 

d) A nonparametric test in “goodness of fit.”” In a test related to the Kolmo- 
gorov-Smirnov tests the following important absorption probability problem 
arose. If X(t) is the Uhlenbeck process (cf. example 6) above) calculate the 
probability 


b(é | 4) = Pr{| X(r)| < &, 0S 7 St} 


where X (0) has its stationary distribution. Thus we have the problem of finding 
the distribution of the random variable M(z, t) defined by (3.7) whose distribu- 
tion function is G(x | &, t) as in (3.8). 
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For |x| < & we have G(x | §, t) = 1 — Fe_¢(x | t) and since for | x | 2 & we 


have G = 1 we define F = 0 for! 2! = ¢. The stationary distribution of X(t) 


' 
—} —2r2/2 


s° 
is N(0, 1), that is, has a density ¢(x) = (2r)*e™* * and hence 


bei2) = [ o(x)G(x! &,t) dx = / (x) F;,:(x | t) dr. 


On taking Laplace transforms we get 


\ 


b(E | jA) =-- tf o(x Fe (x!) dx = 7 1 — : o(x)fe_2(a |r) dx> 


and aati from (5.3) we have 


Pa 4 


> 


r pic cineca 
od )= - 4/2  D_() .(&) + D_)(—- 


e**'*(D_y(2) + D.»(—2)) de} 


/ 


0 


This result was given, without proof, in [1]. 

e) The optional stopping problem. In [15] Robbins outlined the optional 
stopping problem. Let, as in example c) above, the problem be that of testing 
the mean of a normal universe with known variance, say o, but instead of 
testing the hypotheses H, and H2 of example c) we have the hypothesis H, : 
6 = 0 to test against H. : 6 # 0 (Robbins considers H2 : @ > 0). As sketched by 
Robbins the basic problem is to calculate the probability, when @ = 0, S, = 


Xe ~ X92 +, or ae + hey 


’ / 
g(ny , M2, a) Pr{| S, |< aovVn, m Sn S np} 


| 


for given a, nm, and n.. For the case of S, instead of | S, | Robbins gave an 
g 

inequality, and here we give an approximate and an asymptotic result. 

The random variables {S,/c~W/n}, nm = m,m + 1, --: , m have mean 0, 
/ ’ ’ ’ 
variance 1, are normally distributed and form a Markov chain with covariance 
f S; S, ) min (j, n) 

—S—> = =e 

ey?” oV n) Vin 
Hence their joint distribution is the same as the joint distribution of X (3 log n;), 
X(3 log (nm: + 1)), --- , X(3 log n2) where X(t) is the Uhlenbeck process; (cf 


examples b) and d) above). Hence we have, using approximations like those in 
example c), 


—|4logj—jlogn| 


g(m, M2, a) 2 Pr {|X(t)| < a, } logm St S 3 logm) = Wal} 5 log : =), 
1 


where b(é | ¢) is the function of exanple d) and of which we have the Laplace 
transform. 
It is also possible to give an exact asymptotic result which is applicable even 
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if the variables are not normally distributed, but merely have mean 0, variance 
o, and obey the central limit theorem (e.g., if they are identically distributed). 
Let m)— ©, m,— ©, n/n2 >t, 0 < t < 1, and consider a sequence {t,},n = 
mi, my + 1, -++ , me defined for fixed nz by t, = n/ne ; this sequence depends 
upon m2, {tnjng, and for ny — x becomes everywhere dense in the interval 
t< 7 S 1. That is, given any r(t S 7 S 1) we can choose an element 7, from 
ft,}, such that lim,y...7, = 7. 
Then since 


\ 
’ 
i Snaln | 


a < avi, , msn Ss m)? 

oV no 
it will follow from a theorem of Donsker [6] that the limiting distribution g 
can be expressed as the distribution of the corresponding Wiener functional. 
Hence for nj > ©, n2— %, n/n t,0 < t <1, 


g(m, M2, a) = Pr 


g(m, m2, a) > Pr{| W(r)| <avW7,tS7 Sl}, 


where W(t) is the Wiener-Einstein process (cf. example a) above). 
Now if X(t) is the Uhlenbeck process (cf. example b)) we can write W(t) = 
Vt X(4 log t) (Doob [8}) and thus 


lim g = Pr{| X(@ logr)| < a,ts 1} 


ai a 1) 1 
Pri | X(7)| S a, 0 S 7 S 3 log-> =) (« slog +), 


t, 


and since 1/t = lim n/n; we obtain g ~ b(a@ | 4 log ne/n,), the approximate 
expression deduced above. It seems somewhat striking that these two expressions 
should agree, being deduced from essentially distinct principles. 


6. On the moments of 7. In the preceding work the distributions were 
generally expressed as Laplace transforms which are often difficult to invert 
but which give immediate information about the moments of T. 

In the present section we suppose that Pr{7 < «} = 1, that is, that T isa 
proper random variable, as otherwis~ the moments will not exist. If the corre- 
sponding Laplace transform is 1 for \ = 0 the variable is proper. Let us put 


ie? (2) = E(T2(2)), (x) = E(T?(x)) 


which we suppose to exist for n S no. We have by a series expansion 


ng 


(x ' 
z fon (2) (—A)" + ofr"), 


n= nl. 


no t”’ (x . na 
> eM) (yy + ofr), 


n=l) mn. 


(6.1) 


from which the moments are determined. 


From equations (3.5) and (3.6) it is possible to express fas = fos + fa» in 
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» * “ ‘ n) 
terms of the transforms f. and from this fact we can express the moments (a; 


(n) 


in terms of the one-sided first passage moments ¢.”. We get in fact from (3.5) 
and (3.6) 


(6.2) ja(x|r) = 


7 7 “os apenas paceman ’ 
falb| )fla|d) — 1 

and it follows that t,5’(x) will be given by an algebraic combination of tS (2) 
and ts)” (x) for k < n,j < n, provided these moments exist. But it should be 

(n) ° e e » (k) 

remarked that ta; (x) will exist in general for finite a, b, even though t;"’(z) may 
not, as the simple Wiener-Einstein process, for which t{(z) = © for k = 1 
shows. 


In particular for n = 1, where we put ¢” = t, we get for the mean first passage 
time by a simple expansion of (6.2), 
ta(x)t(a) + &(x)ta(b) — ta(b)t,(a) 
ta(b) + t(a) 


This formula leads to interesting consequences. Let @ and 6 be such that 
t,(b) = t(a). Then since (a) = t.(a) + &(x) (6.3) becomes simply 


(6.3) ta(z) = 





ta(x) — t,(a) 
ee 


= 


(6.4) to(r) = 


The right-hand side of (6.4) is independent of 6, and since t.,(z) 2 0 we have 
the result that when a > x > b and t,(b) = &(a) then t,(x) 2 t.(a). Thus it is 
possible in a stationary process that the mean length of time it takes to go 
from a less probable state to a more probable state for the first time is longer 
than that it takes to reverse the journey. It is simple to construct processes for 
which this result obtains, for example, one in which the stationary density is 
symmetric and bimodal. 

It is possible also to express the probability of absorption in the barrier a 
before b by means of the one-sided first passage moments. Since f5(2|0) is 
this probability we obtain from (3.5) and (3.6) 


— falb| fez |) — fax |r) | 
fila|d)f.(b| x) — 1 


a+ 
fas(x | r ) 


hence letting \ — 0 we obtain the conclusion that if the first passage moments 
exist the probability of absorption ina before b is given by P = (ta(b) + th(x) — ta(a)) 
(t.(b) + &(a)). 

Since the expressions f, f~, and f~ satisfy the differential equation (4.2) if 
the corresponding transition density p satisfies (4.1) it is possible to find the 
moments ¢” directly through a differential equation, and this often affords a 
method that is computationally more feasible than a direct evaluation of f. 
We have in fact the following theorem. 

THeoreM 6.1. Let X(t) satisfy the hypotheses of Theorer. 4.1. Then if T = T(x) 
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is a proper random variable whose moments of order n S no exist, t” 
satisfies the system 
2,(n) n) 
2 dt dt 
‘Be — 1 —— 


dx* dz 


(6.5) t 


(0) 


tas (a) = tas (b) = 0, n> 0. 


To prove the theorem we merely substitute the expansion (6.1) in the dif- 
ferential equation (4.2) and equate the coefficient of A" to zero. 

The system (6.5) is particularly easy to solve since the substitution Z”’ = 
dt"’/dx renders each equation linear of the first order, and the solution can be 
written immediately in quadratures. Starting with n = 1 each t'” can be ob- 
tained in turn in quadratures from the previous t’(k < n). In particular for 
n = 1 we have 
dt 


- A—=-1, t= #§(z) 
dx 


t(a) = t(b) = 0, 


a 
1B — 
dx? 
(6.6) 


a result we have used already in Section 5, Example c). 


7. The range of X(t). In this section we develop a formula for the distribution 

of the random variable 
R(z, t) = sup X(r) — inf X(r) 
Osrst Osrst 

which is called the range of X(t), or the oscillation of X(t), and we denote its 
distribution by @(a | r, 4) = Pr{R(z, t) < r}. Note that this probability exists 
if X(t) satisfies ¢ aditions A) and B) of Section 2. 

A treatment o1 the random variable R for the Wiener-Einstein case has 
been given by Feller [10] in a statistical application, and the present section 
solves a problem he posed on finding the distribution of R for other processés. 

Again we presume the existence of a density for R, say ¢(x!r, t) = 
d(x | r, t)/dr only to expedite the analysis. It is not difficult to show that the 
existence of a density for 7 implies that for R. 

THeoremM 7.1. Let X(t) satisty conditions A) and B) and let o(x | r, t) be the 
density of R(x, t). Then for fas(x | ) as in (3.4) we have 
2 " 12 z+(r/2) 
a gel) = S58 


We note that (z | r, \), being merely | (x | u, X) du, is given immediately 


“0 


(r/2) .e- r/2)( | A) dv. 


since @ is expressed as a derivative. 
The starting point of the proof is the formula 


d(x\r,t) = / | : (1 — Fala ») | db 
z 


= da db Jamb+r 
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which is established readily by an enumeration of cases. The existence of the 
derivative (under the integral sign) follows from the existence of the density of 
X(t) at a and b, for when 6 > 0 


/ 


Fa(x | t) — Fausa(x | t) = Prfa < X@ < a+b, X(7r) >b0E7 SE th. 


On taking the Laplace transform of the preceding expression (which can be 
done under the integration and differentiation operations) we obtain 


Ay 1 [ a 4 ; | 
(x|r,A) = - —— fax(z | 
wer , AN Jer | saat oy | 


and the conclusions to the theorem follow by noting the identity 


db 


a=b+4 


_ i a > P 
a fap(X dN) = ; Tavs 2 rN) —_ ; Peis A r). 


daab* sacle 0b or or=* 


As an application we consider the Wiener-Einstein process for which we have 


shown ((5.1) and (5.2)) 
pions a+b 
cosh Y2A (x —- = 


A) = sarnneniasimematinn A 


— a—hb : 
cosh ¥ D ( = ) 


and here (7.1) gives on performing the integration, 


/ = 


) 
3 


‘ a d 
o(x | 7, X) — Ay — tanh — 7 
/ dr or- 2 


independent of x since the process is spatially homogeneous. This latter transform 
is easy to invert, and we have 


| —2rt(j + 4)°) 
i 1\? exp 9 ) 
TT 2 . iar 


\ 


8 x 


Zz (—1)’ Fe 


V 2nt j= 
these two expressions being related by Theta function identities, and the second 
being given by Feller [10]. For the moments we get from (7.2) immediately 
E(R") = c,t"~ where 


sn (2 2 


: p 5 tanh p dp 
an 1) 2 dp 
\st ) 


so that, for example, E(R) = V8t/r, E(R°) = At log 2, ete. 
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ON STEIN’S TWO-STAGE SAMPLING SCHEME! 
By B. M. SEELBINDER 
University of North Carolina 


Summary. This paper gives a method for determining the size of the first 
part of a two-stage sample for estimating the population mean with a given 
accuracy. The method is based on a scheme of Stein [4]. The tables necessary 
for application of this method have been given. A more detailed summary will 
be found at the end of the Introduction. 


1. Introduction. When it is desired to investigate the characteristics of a 
specified population on the basis of a sample, the size of the sample depends on 
the accuracy which it is desired to attain. Thus, in order to estimate the mean 
or average of the population we may fix an allowable discrepancy d, and a risk 
or significance level a, such that the chance of the absolute difference between 
the true mean and the estimate exceeding the allowable discrepancy d is not 
greater than a. Thus 
(1) P{|T —m|2d} Sa, 


where m is the true mean and 7’ is its estimate. 


One approach to the problem of sample size is to use a two-stage sampling 

lan, the size of the second part of the sample depending on the information 
I £ 

supplied about the variance of the population by the first part of the sample. 


Stein has suggested such a two-stage plan, but in his work the size m, of the 
first part of the sample is left to the discretion of the experimenter. If n is the 
total size of the sample (including both parts), then the expected value of n 
depends on m;. It would thus be worthwhile to have some clues for the proper 
determination of m,. In this connection Cochran [1] states: 

“The average value of n that is required in a given situation depends on the 
choice of n,. Exact information about the optimum value of m is not yet 
available, the optimum being that value which leads to the smallest average n. 
It appears however that the optimum n; would be such that a second part will 
usually be necessary. In other words, if it is convenient to take the sample in 
two parts, 2; should be chosen as somewhat less than the size that seems to be 
needed.” 

It is the object of this study to throw further light on the choice of the value 
of n, , the size of the first part of the sample. For this purpose we first compute 
the expected value of the total sample size for given m, when a and ¢ = d/o 
are given. The necessary formulae for computing tables of these expected values 
are derived. 
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It was shown by Stein that the computation can be made to depend on the 
knowledge of Pearson’s Incomplete Gamma Function J(u, p). An approxima- 
tion whereby the computation can be made to depend only on the knowledge of 
the normal distribution function @(L) is also developed. Numerical evidence 
for the adequacy of the approximation for moderately large values of n:(n; = 61) 
is adduced. Limiting values for the expected value of the total sample size E(n) 
are given for fixed n; and a with varying c. 

The use of the tables in choosing 7; is discussed. If we have an approximate 
knowledge of co, enabling us to fix an interval in which ¢ might be assumed to 
lie, then the value of n; can be determined with the help of the tables by using 
the minimax principle (i.e., minimizing the maximum loss in observations due 
to ignorance of c). Reasons are given which point to 250 as a practical upper 
limit for the size of the first part of the sample in a two-stage sampling scheme 
when we have no precise knowledge of ¢. Tables for four different significance 
levels of a@ are included at the end of the paper. 


2. Stein’s method. Stein’s two-stage plan for estimating the population mean 
may be stated as follows. Given d and a, we start with a random sample of size 
n,. An estimate of the population variance o° is given by the sample variance 


m1 
(2) si = >. (xr; — &)*/m, 


i= 


where vo = m; — 1 and Z, is the sample mean. A confidence interval for m may 
now be calculated. The half-width of this interval is given by 


(3) sit/V 


where ¢ = t(a, mo) is the value of ¢ corresponding to the given significance level 
a, for no degrees of freedom. If 


(4) sit/Vn, S d, 


the sample is already sufficiently large and we stop here, saying that the estimate 
T of mis given by T = % . If 


(5) st/Vn, > d, 


then additional observations are taken so that the total sample size is the 
smallest integer not less than n, where n is given by 
(6) n = oit'/d’. 
The estimate of m is now given by T = &, where # is the mean of the total 
sample. 

When this procedure is adopted (1) is satisfied. Thus given m, @ and d, 
if the event (4) occurs then the total sample size is n;. On the other hand, if 
(5) occurs, we shall make our calculations as if the total sample size were n 


given by (6), neglecting the small discrepancy introduced by the fractional 
nature of n. 
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3. Expected sample size E(n). Now 


( 


» > 
‘ 


) X = NoSi/o 


obeys the x° distribution with n) degrees of freedom. Setting d°/o = c’, we see 
that when (4) happens 


(S 0<x <s C non, /t’, 

and n = n;. When (5) happens 

(9) x a cnn, r 

and n = (y*/c’nm . Recalling that the x° distribution with no degrees of freedom 


1S 


(10) 


we may write 


x3 2 Py? 

+/ a 2 2 2 X f.2 \ 2 

(11) E(n) = | nf(x’, m) dx = | mix, mo) dx + | —— f(x", m) dx’, 
’ . -0 J x? C-No 

where xs = €or; ¢. Evaluating the second integral above by parts we have 


- © 


| f(x", mo) dx’ + K, 
2 


x 


x 


re. ba Ss 
(12 E(n)=m |] flix,m) dx += 
C 


where 


ng/: 


- (x5/2)"° 
IE ween, 


To .. IP 
= V'(no/2)e* 


> 
2/9 


f(x°/m) dx’. 


Now Karl Pearson’s Incomplete Gamma Function is defined as 


urv/ p+ 
(15) I(u, p) = | 


-0 
Putting vy = x°/2, p = m/2 — 1, u = xo/V 2m, we see that 
(16) F(xo) = I(xo/V 2no , No/2 — 1). 


Using (16) in (12) we have 


; | oe 
(17) E(n) =| (m+ 1) - | Fo 
Cc 


rhis formula may be compared with formula (16), p. 247 of Stein’s paper.) 
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4. Normal approximation for expected sample size. For large values of n; we 

may use the fact that 
c= V2y3 -— V2n — 3 

is asymptotically normally distributed with zero mean and unit variance. We 
shall discuss later, on the basis of the evidence provided in the numerical calcula- 
tions, what constitutes a sufficiently large value of nm, for the approximation 
derived below to hold. 

When (4) happens, (8) may be written 


(18) -« £-V2n-3 


where 


c ; pee alii 
(19) L = j V 2non — V2n, — 3- 


The sample size is in this casen = m. 
When (5) holds, (9) may be written 


(20) o> LE. 


Thus the sample size in this case is 
e ot 
n= —— (2+ V2n, — 3). 

20° 


Hence the expected sample size can be written 


’ my : 22/2 1 ee e 
(21) Em) =a => ( ~ dz + T= | a er 
VW 2rd. V2rJi 20% 


where in the first integral in (21), the lower limit has been put as — ~ instead 
of —V2n, — 3. For moderately large values of m , say mn, > 10, this will cause 
only a negligible difference. On evaluating the second integral in (21) we get 


(22) E(n) = [im +1) -S]au + 5/1 + a9 7 |, 
Cc Cc 


where as usual 
| ; 
&(L) = / a. 
Var, 


5. Make-up of tables and use df normal approximation. Formulae (17) and 
(22) were used to determine E(n) for assigned values of mp = nr, — 1 with four 
different significance levels a = .01, .02, .05 and .10 and values of ¢ = dia 
ranging from .01 to 1.0. For each combination of ¢ and a, a value of E(n) is 
listed in the tables. The size of a sufficiently large value of ny for the normal 
approximation to be valid was determined on the basis of the numerical calcula- 
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tions. For 10 S no S 60, the values of E(n) were computed both by the x’ 
method and by using the normal approximation, which served primarily as 
a check in this interval. A comparison for the results for E(n) is given below for 
a portion of the data as computed for Table IIB. 


Expected sample size fora = .05 


d/o 


x? N 2 N x2 N x? I x? 
60 = 400.00 400.00 100.03) 100.04 61.11 61.10 61.00 61.00 61.00 61.00 
30 416.98 416.98 104.25 104.28 46.65 46.71 32.02 31.99 31.01 30.94 


20 435.14 435.14 108.75 108.78 48.33 48.40 28.15 28.20 | 21.95 21.92 


A similar comparison table for a = .01 was computed and the same degree of 
agreement was found to exist. In no instances did any pair of values of E(n) 
computed by (17) and (22) differ by more than one observation. In fact, for 
no = 60, the two computed values of E(n) were so close as to warrant use of 
only the normal approximation when no > 60 (for all four significance levels). 
Figures in Tables I to IV are correct to the penultimate digit. The last digit 
shown may be in error by unity (e.g., a value listed as 20.8 might actually be 
20.7 or 20.9). 


6. Limiting values of E(n). Let us consider the value of E(n) given by (17). 
For fixed no and a, t is fixed. As c increases, both xo and F(x0) increase. Dif- 
ferentiating E(n) with respect to xo , we get 

no(no + 1) 


ae Z 
(23) E'(n) = ; [F(xs) - 1 + K)). 
Xo 


Now F(x;) < land K 2 0. Hence E’(n) is negative, which shows that E(n) is 


a monotonically decreasing function of xo or of ¢ for fixed no and a. With ¢ 


fixed, as c’ increases F(x) — 1 and K — 0, causing E(n) to approach no + 1. 
It should be noted in the tables that for any row the value of £(n) for increasing 
c has been given until it sensibly becomes equal to np + 1. In the blank space 
left thereafter in the Tables IB, IIB, IITB and IVB values of E(n) will sensibly 
remain equal to my + 1, since obviously, 


(24) E(n) = m + 1. 
Let us set 
E(n) — t/c’ = $(x3). 
Integrating (14) by parts we find 
1 pee x712(42)M0 Pe ie 


(26) F(x3) = K+-— inant tite. 
xo) ‘Ss 2”0/2 1 (n/2) . 
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Hence 


xs 
(27) (x5) = (no + 1) f(x’, No) l —_ ° 


Also, 


2 ‘ 0 1 : Me »—x 2/2 2) ng/2 
(28) ¢' (xo) = E'(n) oo : (no z ) siete (m + 1) ee 


ng/2 j d a 
xo um +» FT T(a,/2) 


Now 0 < x’ € xi. Hence 0 € (1 — x’/xé) S 1. It follows from (27) that 
(29) 0 S $(xo) S (m + 1)F (XO). 

Hence ¢(x5) is positive and tends to zero as x5 — 0. Thus 

(30) E(n) 2 t/c’, 

and 


(31) lim [E(n) — €/e’|] = 0. 
20 


Also from (28), ¢’(xi) is positive, which shows that E(n) — ¢/é monotonically 
decreases as c decreases. In calculation of the tables we have used the fact that 
when E(n) sensibly becomes equal to ¢/c’ for any value of c, it will remain so 
for smaller values of c (mo and @ remaining fixed). 


7. Use of tables for a two-stage scheme. When an approximate estimate of 
o is available, and the sample size is determined by one-stage sampling, the 
mean eventually determined may have less accuracy than what is desired, for 
it may turn out that the estimate of - is in error. This situation is avoided by 
using Stein’s two-stage plan. Our Tables I, I, II and LV give a good guidance 
for choosing a suitable size for the first part of the sample when used in conjunc- 
tion with the minimax principle (i.e., minimizing the maximum loss in observa- 
tions due to ignorance of a). Of course, the tables allow for only four significance 
levels, namely a = .1, .05, .02 and .01, but these are likely to suffice in practice. 

Let E(n | n,) denote the value of E(x) corresponding to n;. For each value ot 
c in the tables we have a minimum value for E(n) = &/e = E(n | e c), which 
is the total sample size if ¢ were known. Let D = E(n | m) — E(n | t2/c’). This 
difference D represents the loss in observations brought about by our ignorance 
of o. 

If it is believed or assumed that ¢ lies within a certain interval, say 0; = ¢ = o», 
we can calculate, for fixed ¢, an interval for c, say c. S ¢ S c. In choosing a 
starting sample size m, we want to select that value of nm, which gives us an 
optimum D, that is, that value of D which causes us to lose the least number of 
observations. 

Let nf be the value of n; corresponding to the optimum D. The following 
procedure may be employed to find an optimum D and nf . For each value of 
c in our guessed interval there is a smallest E(n) = E(n | mn) and its corre- 
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sponding nm; . We take these values of m along with our values for c and tabulate 
values for D. From this tabulation we select the two nm; values which would 
lead to the smallest value of D over the interval for c. Using interpolation 
between these two values of m we arrive at an optimum D and nf . 

Thus n, = [nt], where [nt] indicates the smallest integer =nt , gives us the 
first part of our sample. After the estimated variance sj from this part of the 
sample has been calculated, we proceed to take the second part of our sample. 

The above discussion may be illustrated by considering a specific example 
based on data by Cochran [1]. Suppose d = 10, a = .05 and a is believed to lie 
in the interval 100 2 o 2 25. Thus .1 S c¢ S 4. In Table II force = .1, .2, 3 
and .4 we find that the n: corresponding to the smallest #(n | n,) for these values 
of c are nm, = 241, 61, 31 and 21. Using these values of n; with the values of c, 
we get the following tabulated values of D. 


Starting sample 
size my 


241 
61 16 
47.3 23.7 
31 33 


21 51 


From the above figures we see that if we started with n, = 61 we micht expect 
to lose no more than 37 observations, or if ny = 31 we might expect to lose no 
more than 33 observations. Interpolating between these two values ot ™%, we 
find that the optimum D is 23.7 and nf = 47.3. Thus mn = 48 should 5e our 
starting sample size and should cause us to lose no more than 24 obse: vations 
if o lies between 100 and 25. 

Thus when the sample is to be taken in two parts and we can assume an 
interval for a, use of the minimax principle in conjunction with our tables is 
recommended as the procedure to be adopted in selecting a starting sample 
size. 


8. An upper limit for n,. The minimax principle is very satisfactory if the 
true value of o lies within our assumed interval. However, if we do not feel safe 
in assuming an interval for o, we still want to limit our loss in observations if at 
all possible. 

An examination of the tables reveals that if ¢ S .1 or o 2 10d, then for all 
four significance levels the expected sample size for n; = 241 differs from the 
minimum E(n) given by t/c’ by comparatively few readings. For fixed a, 
n, and c, the value of E(n) could never be smaller than the value given by 
t,/c’. If we define percentage loss as [E(n | m) — E(n | €./c°)|/E(n | nm), we see 
that this ratio is less than or equal to .02 for all four significance levels when 
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eS .Land m = 241. Since E(n | mn = 241) > E(n | ny > 241) our percentage 
loss could never be greater than .02 for nm > 241 ande S .1. 

If we have no precise knowledge of o we could go far wrong in choosing n . 
lor example, suppose a = .10 and we choose n; = 2400. Using formula (22) we 
can compute E(n | ny = 2400) for any given c. Let us consider the following 
tabulation. 


Expected sample size fora = 10 


c 
01 02 03 04 05 06 07 08 -09 


t../c* 27060 6765 3007 1691 1082 752 552 423 334 271 67.6 
2400 27090 6772 3010 2400 2400 2400 24 
5. 


00 2400 2400 2400 2400 
240 27290 6822 3032 1706 1092 758 57 426 337 £4273 =~ 241 


We see from the above figures that if the«true o leads to a value of ec < .03, 
n, = 2400 would be a slightly better starting sample size than n, = 241. But, if 
the true o leads to a value of c > .03, n; = 241 is far more efficient than n; = 
2400 since our values of E(n | nm, = 241) are considerably smaller than the 
E(n | m = 2400). 

There is of course no peculiar virtue in the precise number 241 and we may 
round off our figures to 250 stating the following rule. When using Stein’s two stage 
sampling scheme and the value of ¢ is uncertain, but there is reason to believe 


that o is not so small as to make ¢ = d/o appreciably greater than .1, then the 
size of the first part of the sample should be taken to be 250 or thereabouts. 
Thus we may regard 250 as a sort of practical upper limit for the size of the 
first part of our sample. 


I am indebted to Professor R. C. Bose under whose guidance this research 
was carried out. 
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(Tables I to IV follow) 





te 


Ss » 
c2 


240 
120 





TABLE IA 
Expected sample size for a = 10 


d/a 


01 02 03 04 05 


27,060 6,765 3,007 1,691 1,082 
27,290 6,822 3,032 1,706 1,092 
27,500 6,875 3,055 1,719 1,100 


TABLE IB 


d/a 


21.0 
14.6 
16.2 


30.1 16.9 10.8 


TABLE ITA 


Expected sample size for 


01 02 -03 


38,410 9,602 4,268 2 , oe 1,067 
38,810 9,702 4,312 2,426 ,o02 1,078 
39,200 9,800 4,355 2,4: 568 1,089 


TABLE IIB 


d/o 





2, 
f/e 


240 
120 


TABLE IITA 


Expected sample size for a 


02 03 04 


54,100 13,520 6,011 3,381 2, 

54,850 13,712 6,094 3,428 

55, 600 900 6,178 3,475 
TABLE 


d/a 


02 





1 





548 
556 
564 
571 


oid 


$1.1 
67.3 
63.9 
65.4 
69.1 39.0 31.8 
71.0 40.1 26.9 
84.9 47.8 30.6 


60.1 


81.0 
61.1 
51.2 


43.1 


61.0 
51.0 
587 41.1 
604 
639 
764 
541 


TABLE IVA 


Yxpected sample size for a 


16,590 7,373 4,148 : 
16,860 7,493 4,215 
17,125 7,611 4,281 


, 060 
, 440 
38, 500 


8 


,o 
, Of 
740 


TABLE IVB 


d/o 


Ol 


-06 


954 1.843 
8 1,873 
1,905 


21. 


33.8 21.6 15.3 11.0 8.4 6.7 


.07 .08 09 


1,354 1,037 819 
1,376 1,054 833 
1,398 1,070 846 





DISTRIBUTION OF THE MEASURE OF A RANDOM 
TWO-DIMENSIONAL SET 


By HERBERT SOLOMON 


Teachers College, Columbia University 


1. Summary. This paper considers the distribution of the measure of a special 
random two-dimensional set. Related work, usually motivated by a search for 
principles for bombing operations, deals exclusively with moment problems and 
appears in [1], [3], [4], [5], [6], [7], [8]. A one-dimensional distribution problem 
appears in [2]. The random set considered is the intersection of a fixed circle 
with the union of NV random circles. Centers of the random circles are subject to 
the variability imposed by the bivariate normal distribution with circular sym- 
metry and means not necessarily coincident with the coordinates of the center of 
the fixed circle. The measure of interest is the ratio of the area of the intersection 
(‘covered area’’) to the total area of the fixed circle. For NV = 1, the distribution 
is determined and its use facilitated by the graphs in Fig. 1 and Fig. 2. A pro- 
cedure for obtaining upper and lower bounds of the distribution for N = 2 is 
given. Tables I, II, III, and [V give upper and lower bounds for the percentage 
points of the distribution for Vo = 2 for some special illustrative situations. 
For N = 1 in all situations, and for NV = 2 in many situations; the graphs 
and tables demonstrate that a realistic decision can be made rather easily 
without resorting to the usual practice of random number ‘‘Monte Carlo” de- 
vices for each ad hoe situation of interest. 


2. Development of distribution. Consider a fixed circle of radius 7 and an 
aiming point at a distance R from the center of the fixed circle at which are 
dropped N random circles of equal radius W according to the aforementioned 
bivariate normal distribution specified by the parameter o. Define c(0 < ¢ < 1) 
as the fraction coverage, that is, the ratio of the area of the intersection to the 
total area of the fixed circle. We are interested in finding 


Pe = fic = C/W, 4. a. oN 


where P¢ is the probability of getting at least C fraction coverage for specified 
values of the parameters: W, 7, R, o, N. In order to achieve C coverage for 
N = 1, the center of the random circle must fall on or within the circle having 
the same center as the fixed circle and a radius R* = W + aT. The relationship 
between a and ¢ is developed by integration from the geometry of the picture 
and is 


e= (1+ SS) — -[fla) -— S”f(g)] 
T 
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where S = W/T, f(z) = are sine + 2V1 — 2°, 2a =v + (1 — S*)/v, 23 = 
—v/S + (1 — S)/vS and v = S + a. This relationship is graphically depicted 
in Fig. 2. 

The problem, then, of finding Pc is equivalent to the probability of getting a 
point in an offset circle whose radius, center, and distance between center and 
aiming point are known. From the bivariate normal law assumption, the prob- 
ability, dp, that the center of the random circle will fall within an area, dA, at a 
distance, D, from the aiming point isdp = (240°) exp (—D*/2o°) dA. Since the 
center of the fixed circle is at a distance R from the aiming point, if we choose 
polar coordinates p, @ about the former, then D° = R* + p° — 2pR cos 6 and we 
may write 


210". J 20° 


R° Pe 
p= | plo(pR) exp | -(E £4) dp 


where R* and R are now in o units; J,(x) is the modified Bessel function of the 
first kind for order n and argument z. 

It is useful in graphically depicting the distribution to have the slopes of the 
constant contours of probability in the RR* plane. If we define q(R*, R) by 
q(R*, R) + p(R*, R) = 1, then by the theorem or implicit functions we have 


Fae at aS ia 
p=, | bh oe ons ) |» do ae 


0 q 


a aq R*1((RR*) exp [—3(R® + R®)] 


ok* 


a “4 [ [o°I\(pR) — pRIo(pR)] exp [—3(o” + R*) dp 


since d{Jo(z)|/dz = Jh(z). Integrating the numerator by parts and making use 
of the equality 


pRIi(pR) = —I1,(pR) + pRIo(pR) 


where the prime refers to differentiation vith respect to (pR) we get after sim- 
plification, the interesting relationship dR*/dR = 1,(R*R)/Io(R*R). Since for 
large x I,(x) ~ ¢ \/2rx we get for large (R*R) dR*/dR ~ 1. Hence the family 
of curves in the RR* plane defined by P-~ = constant (see Fig. 1) has a slope 
which approaches unity quite rapidly, for from [9] and [10] we see 

15) _ 99 (15) _ 9; 137) _ 99. 

I,(5) Tf 15) 1 (37) 
Since for R = 0, p = 1 — exp (—3(R*)) it becomes possible to construct the 
contours of equal probability because the initial point and the slopes of all points 
on any one contour are known. The unabridged printed listings mentioned in 
[11] however were used in constructing the contours. 
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3. Use of the graphs. Figures 1 and 2 represent the distribution for N = 1. 
As an illustration, suppose W = 1, 7 = 1, R = 1 (¢ units) and P45 is desired. 
Fig. 2 shows that a = .075 forC = .35, S = 1. Thus W + a7 = 1.075 and 
referring this to Fig. 1 we get P35 = .30. Conversely, let W = 3, 7 = 2, R = 2, 
(o units), Pe = .50 and C is desired. In Fig. 1, for Pc = .50 and R = 2, we 
get W + aT = 2.25, then a = —0.375. Then referring to Fig. 2 with S = 1.5, 
we get C = .67. 


4. Upper and lower bounds. Let P(C, N) and P(C, N) represent lower and 
upper bounds for the probability of obtaining C coverage or better when N 
random circles are drepped for a specified set of parameters: T, W, R, o. Then 
certainly 


P(Cc, N) = 1 — (1 — Po),” 
P(C, N) {C1 + C2 + +--+ + ey 2 Ch}, 


where the c; are the coverage random variables for N = 1, and Pe = P{c; = C}. 
Now each ¢,; has a distribution which is a combination of a continuous distribu- 


tion with density f(c) at [0, 1] plus discrete probabilities at 3 if W 2 T, and 
at [ - | if W < 7; where 


f(c) = - : . gic) = W + a(c)-T, 


(i¢ 


plo(pR) exp | -(-+=)| dp. 


Then f(c) = —g'(v)-g(c)Iolg(c)-R] exp —3{[g(c)]’ + R°} where g’(c) = dg(c)/de 
9 


is never positive. A glance at Fig. 2 will demonstrate this. Thus for V = 
we get 


—s 


Cc »C—c, 
P(C,2) = Po - | fey) de, | fcr) dex + 201 — Po)Pc. 
This reduces to 

Cc 


P(C,2) = 2Pe — PePot | fles)Pc—e, der. 
/0 


Thus P(C, 2) is easily determined by numerical integration since f(c) and Pe_, 
can be computed for any value of c. To compute f(c), it will be necessary to 
find g'(c) = a’(c)-T where the prime refers to differentiation with respect to c. 
To find a’(c) we first determine 
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where S, a, 8, are as defined previously. Then 
da [ s° s 
= “{_visel1+ SS] 4 svi=e| SS ba]; 
de é v 
when S = 1, this reduces to 


1 


a(c) = — - — —— |. 
1 + a\? 
Vy “FF 


In the numerical integration trouble results in the interval from 0 to Ac because 


g'(c) has a discontinuity at c = 0. However since we are actually interested in an 
Ac, 


upper bound we can replace | — f(¢:)Pc_c de, by 


-[-max Poe, Io[g(c1) - RI] 


Oto de, 


ed f Jo 2 
—_— (gle) +R 

— exp 4 ————, ——_— 
~( - 


) 


-[max Pe, Io[g(e1)- R]] 


0 to Ac; 


exp z 


[g(e.)}° 2 Rr | 
0 


and thus avoid this difficulty. 
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TABLE I 
Lower and upper bounds for probability of coverage 


W/o = 1, T/o = 1, R = 0, 


2 2 3 4 5 


.926 | .853 | .733 | .619 | .483 | .3829 | .208 
.937 | .888  .796  .726  .638  .533 .440 


TABLE II 
Lower and upper bounds for probability of coverage 


S = i. G 


.339 | .098 | .924! .798 | .445 | .107 | .997 | .978 | .815 | .407 


.122 | .993 | .947 | .727  .240 1.000 1.000 | .990  .858 


TABLE III 
Lower and upper bounds for probability of coverage 
S = 2, = ., N =2 


0 2 
.609 .974  .916) .609 .190 .999 .996 .935  .609 .190 
.646 .987 .963 | .691 .248 11.000 | .998 | .983 | .752 | .280 


TABLE IV 
Lower and upper bounds for probability of coverage 


’ 
S = 5, 


.360 .069 990 953 . 705 .278 .040 
.900 418 | 1.000 996 .564 .118 
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RICHARD VON MISES’ WORK IN PROBABILITY AND STATISTICS 


By Haratp CRAMER 
University of Stockholm and University of California, Berkeley 


Professor Richard von Mises of Harvard University died in June, 1953, shortly 
after his 70th birthday. A native of Austria, he took his doctor’s degree at the 
Technical University of Vienna in 1907, and then acted as lecturer and professor 
at various universities in Austria and Germany until 1920, when he became pro- 
fessor and director of the Institute for Applied Mathematics of the University 
of Berlin. The Hitler regime, depriving the German universities of so many of 
their best men, brought von Mises to Istanbul, and fina'ly, in 1939, to Harvard. 
At Harvard, he first was professor of mathematics, and in 1944 became Gordon 
McKay professor of aerodynamics and applied mathematics. 

Richard von Mises was one of those men who have both the ability and the 
energy requisite for taking an active and creative interest in many widely different 
fields. He has made outstanding contributions to subjects as heterogeneous as 
literary criticism, positivistic philosophy, aerodynamics, and probability. In 
this short notice, we shall be concerned exclusively with those of his works that 
belong to the field of probability and mathematical statistics. 

It is well known that Richard von Mises is one of the significant names in the 
history of the tremendous development that has taken place in this field during 
the last thirty years. As can be seen from the appended Selected Bibliography, 
his works on probability and mathematical statistics range from books and papers 
on the foundations of probability which, of course, always represented one of 
his main interests, to investigations dealing with special problems in various 
statistical applications. Only a few of these works can be explicitly mentioned 
here, but it will be attempted to characterize and to follow up some of the main 
lines of thought, along which his contributions seem to group themselves. 

In the year 1919, the two basic papers (37) and (40), which were practically 
the first publications by von Mises on probability, appeared almost simultane- 
ously. The first of these, “Fundamentalsaetze der Wahrscheinlichkeitsrechnung,”’ 
was concerned with the general theorem in mathematical probability for which, 
a year later, Georg Pélya was to propose the now well known name “‘the central 
limit theorem.”’ The second paper, ‘Grundlagen der Wahrscheinlichkeitsrech- 
nung,’’ on the other hand, gave the first exposition of von Mises’ views with 
respect to the foundations of probability theory. Each of these two papers was 
to become the first in a long series of works, and the two groups of works thus 
initiated may perhaps be looked upon as containing the most important of von 
Mises’ contributions to the subject. 

In order to judge the two basic papers correctly, it is necessary to realize the 
situation in mathematical probability theory about the year 1919. Since the 
appearance of the classical treatise of Laplace, a few mathematicians—Tcheby- 
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cheff, Markov, Borel, and some others--had done important work in the field, 
but the conceptual foundations on which the whole subject rested were still 
obscure. ‘There was no commonly accepted definition of mathematical probability, 
and in so far as there were any definitions at all, they were clearly inadequate 
for the numerous applications that were made in fields such as population statis- 
tics, molecular physics, and many others. Moreover, with few exceptions, mainly 
belonging to the French and Russian schools, writers on probability did not seem 
to feel under any obligation to conform to the standards of rigor that were re- 
garded as obvious in other parts of mathematics. The admirable work of Lya- 
punov on the central limit theorem seemed to be entirely unknown among mathe- 
maticians. 

In the introductions to his two above-mentioned papers, von Mises gives a 
review of the situation, and arrives at the conclusion, which seems entirely justi- 
fied, that ‘‘today, probability theory is not a mathematical science.’”’ He then 
develops his own program for building up probability theory as a mathematical 
science, starting from the thesis that probability theory should be regarded as the 
mathematical theory of a group of observed phenomena, in the same way as, for 
example, geometry and theoretical mechanics. Just as geometry gives an idealized 
mathematical picture of the large bulk of our observations with respect to the 
configuration and position of bodies in space, so probability theory should be 
constructed to provide a mathematical model of the statistical regularities ob- 
served in cases where a given experiment or observation may be repeated a large 
number of times under similar conditions. 

Starting from this thesis, von Mises develops in (40) his system of fouridations, 


which was soon to become familiar to all probabilists. We find here the concept 
of a collective, the definition of mathematical probability as the limit of a fre- 
quency ratio, and the two fundamental postulates, requiring the existence of 
the limiting values of the relevant frequencies, and their invariance under any 


place selection. It is shown how the main rules for operating with probabilities 
can be deduced from these basic principles, and a system of classification of the 
operations used in probability theory is worked out. 

The publication of the ‘““Grundlagen” paper aroused a great deal of interest 
among mathematicians, statisticians, and philosophers. Quite naturally, opinions 
were divided, and even if the basic view of probability theory as a mathematical 
theory of random phenomena was, on the whole, completely endorsed by most 
mathematicians and statisticians, the collective concept and the two postulates 
were severely criticized by many authors. An extensive literature grew up about 
these questions, and von Mises himself took an active part in the discussion. 
Besides in a number of papers dealing with special problems, particularly related 
to the second postulate, the foundations of the subject are discussed in his two 
treatises (75) and (127) and, above all, in his well known book (64). In this book, 
“Probability, Statistics and Truth,” which has been translated from the original 
German edition into English, Russian, and Spanish, he gives a detailed exposition 
of his system, intended for nonmathematical readers, and also his replies to 
various criticisms, and his comments on alternative systems proposed by others. 
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It is particularly interesting to read in this book, as well as in the discussion with 
Doob published in (117), his comments on the measure-theoretic approach to the 
suLject favored by a certain number of contemporary mathematicians and statis- 
ticlans. Even though this approach, as pointed out, for example, in the well 
known book by Kolmogoroff, starts from a conception of the object and char- 
acter of probability theory, which is very close to the one advocated by von Mises 
himself, he takes a strongly critical position against this method of introducing 
the concept of mathematical probability and formulating the axioms expressing 
its basic properties. In a chapter with the expressive title, ‘A part of the theory 
of sets? No!,” he declares that probability theory “remains in all circumstances 
a theory of certain observable phenomena, which are idealized in the concept of 
a collective.”’ So far, many of his opponents would be prepared to agree, at least 
partiy. But then he goes on to say that, from this point of view, he finds it impos- 
sible to “‘concede the existence of a separate concept of probability based on the 
theory of sets, which is sometimes said to contradict the concept of probability 
based on the notion of relative frequency,” and that ‘there can be also no ques- 
tion of reconciling these two concepts.’’ This seems to be a final expression of 
his standpoint in the question. The discussion will undoubtedly be continued 
during many years to come, but, however the question of the best choice of 
axiomatic foundations of probability theory may be decided (if it will ever be 
decided at all), it will always stand out as the great achievement of von Mises 
to have been the first to draw general attention to the problem, to have indicated 
the way along which its possible solutions should be sought, and to have given 
one such solution. 

Let us now pass to the second main group of von Mises’ writings in our field, 
the one that begins with the ‘‘Fundamentalsaetze” paper (37). As already men- 
tioned, this paper is concerned with the central limit theorem, and with certain 
other problems belonging to the same general range of ideas. A proof of the 
asymptotic normality of the distribution of a sum of independent random vari- 
ables is given under certain rather restrictive conditions. There is a detailed dis- 
cussion of the asymptotic behavior of the distribution of the sum in the case 
when the distributions of the terms belong to one of those simple classes that 
are usually encountered in the applications. Similar results are given in respect 
of the asymptotic behavior of the a posteriori distributions obtained by applying 
Bayes’ theorem to a sample of observations. 

Von Mises returned to this subject in a great number of his works, continually 
improving his results and extending the field of problems considered. The most 
important papers belonging to this group are (87), (89), (93), (102), (105) and 
(126). With respect to the central limit theorem itself, his main results were 
superseded by others, but he soon generalized the problem in a very interesting 
way, where he was able to find important new results, and which still seems to 
open possibilities for further resaerch. We shall give a brief characterization of 
the problem considered in this group of his writings, which may be said to culmi- 
nate in the paper (126). Let U(2,, --- , 2») be a symmetric function of the inde- 
pendent random variables x; , which are assumed to have a common distribution 
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function F(x). (Von Mises considers the general case of unequal distributions.) 
Then U may be regarded as a function V(S,) of the repartition function S, = 
S,(x), where nS ,,(x) denotes for every x the number of x; < x. It is assumed that 
the function V can be defined on a convex domain D in the space of all distribu- 
tion functions, including the given distribution function F as well as all possible 
repartitions S, with n = 1, 2, --- . It is required to study the asymptotic be- 
havior of the distribution of the random variable V(S,) as n tends to infinity. 
Following Volterra, von Mises defines the derivatives of the function V at any 
given point of D and shows that, subject to certain general regularity conditions, 
the distribution of V is asymptotically normal if the first derivative of V at the 
point F is different from zero. This covers, among others, the classical case of 
the sum of the x; , and also most ordinary statistics based on moments. When 
the first derivative vanishes, certain nonnormal limiting distributions are ob- 
tained, and von Mises gives a detailed discussion of the case (including, among 
others, the case of the x* statistic) when there is a nonvanishing second deriva- 
tive. The limiting distribution in this case is shown to be intimately related to 
the Fredholm determinant of a certain symmetric kernel. As already mentioned, 
interesting problems in this direction still seem to be open for research. 

Finally, we shall only briefly mention some other main groups among the works 
of von Mises. In the papers (96), (100), (111), and (112), the relations between 
various moments and other characteristics of a probability distribution, and 
between these characteristics and the values of the corresponding distribution 
function, are studied, and some important inequalities are given. The papers 
(109), (119) and (121) are concerned with Bayes’ theorem and its various applica- 
tions, a subject which also receives great attention in the treatises (75) and (127). 
Von Mises did not sympathize with the tendency in contemporary mathematical 
statistics to av vid the use of Bayes’ theorem and the concept of a priori probabil- 
ity. In the works belonging to this group, he discusses the application of the 
theorem to various problems, including the problem of testing statistical hypoth- 
eses where, according to him, it leads to more reliable results than the methods 
now currently employed by mathematical statisticians. 

As a glance at the list of publications will show, there are many works of von 
Mises in this field that have not been mentioned at all in the above. The majority 
of these papers are concerned with various applications of probability theory, in 
fields as diverse as physics, genetics, demography, and actuarial science. 

This brief and insufficient review of a small part of the writings of Richard von 
Mises will certainly be enough to give the reader a strong impression of an active 
and powerful scientific personality. Those who knew him saw, in addition, many 
other sides of his personality, giving him a human charm that his friends will 
never forget. 
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NOTES 


TABLES FOR A NONPARAMETRIC TEST OF DISPERSION 
By 8S. RosenBaumM 
Directorate of Army Health, London 


1. Introduction. Two samples are known to come from populations with the 
same mean (or median). Without reference to the form of the distribution 
function, we want to test the hypothesis that the populations are identical, and 
we simply count the number of points in one sample which lie outside the ex- 
treme values of the other sample. The device owes its origin to 8S. S. Wilks [1], 
who derived the basic formulas in a classic paper on tolerance limits. In the 
absence of the prior knowledge that the two populations have the same location, 
the test becomes merely a two-sample test of identity of the populations and 
rejection may imply difference in location as well as shape. (A one-sided modifi- 
cation of the test would, of course, be more appropriate for a specific test of 
location.) 


2. Test that the samples come from the same population. We draw a random 
sample of n points and a random sample of m points from a population with a 
continuous distribution function. The probability that r points of the sample 
of m will lie outside the end values of the sample of n is [1] 


! ( ( .— 9 = 9)! 
P, te ~ 2) m! (r+ 1)(n + m—T )! 
(m — r)! (n+ m)! 

n(n — 1) () Bin + m — 1 —r,r7r + 2), 


where B is the complete Beta function. 
For ro < m, 


= Pr, = > a(n = 1) (™) Bin+m—1—7r,7r+ 2) 


is the probability that the value of r is not greater than 7). Otherwise stated, 
one minus this quantity is the probability that the value of ris rs + 1 or greater. 
We can therefore fix a probability level € and arrive at an rp such that 
Lae Ka S lh 

Tables of r = ro + 1 are given for « = 0.95 and e = 0.99 over the range 

Lo-> 50. m = 1, --- , 

It may be worth pointing out that when ry = m, the hypothesis cannot be 

rejected whatever value r takes in the sample. Also a certain symmetry arises 
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for n = m, where Po = n(n — 1)/2n(2n — 1), Pi = n(n — 1)2n/ 
2n(2n — 1)(2n — 2), so that Po + P; = 3. This is equivalent to saying that for 
equal samples of any size from the same population, whatever the form of its 
distribution function, the chances are even that not more than one point of the 
second sample shall lie outside the range of values of the first. 

For large and approximately equal values of m and n, say m = n = N > O, 
P, is approximately equal to N°(r + 1)N"/(2N)"*? = (r + 1)/2"**. Then the 
critical values of ro + 1 turn out to be, 


ate = 0.95,7 + 1 = 7, 
at « = 0.99, 75 + 1 10. 
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The probability is less than 5% that r or more points of a sample of size m lie outside 
the extreme values of a sample of size n if the samples are drawn randomly from the same 
population, whatever its distribution. 
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The probability is less than 1% that r or more points of a sample of size ™ lie outside 
the extreme values of a sample of size n if the samples are drawn randomly from the same 
population, whatever its distribution. 
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AN OPTIMUM SLIPPAGE TEST FOR THE VARIANCES OF k 
NORMAL DISTRIBUTIONS 


By Donatp R. Truax 
University of Washington 


1. Summary and introduction. In many practical problems, the experimenter 
is faced with the task of deciding if the variability within several classes is uni- 
form throughout the classes, or if not, which class exhibits the greatest amount 
of variability. This type of problem arises when the data relate to several proc- 
esses, to the same process at different times, to several different products, or to 
the same products from different sources. If the variability is not uniform 
throughout the classes, then misleading results would be obtained in comparing 
the classes in other respects. If the experimenter expects the variability to be 
uniform throughout the different classes, and if the variability is large in a par- 
ticular class, he will consider the situation to be ‘‘out of control’? and take 
measures to locate the source of the large variability. 

The problem we will consider here is that of comparing the variances of k 
populations, II, , II; , --- , Tl, , on the basis of n observations 71, ®2, °°: , 
from the ith population. We will assume that these observations are normally 
and independently distributed with unknown mean m; and unknown standard 
deviation o; fori = 1, 2,--- , k. Our problem is to find a statistical procedure 
which will, on the basis of these observatious, decide if all the populations have 
equal variances, and if not, which has the largest variance. We would like the 
procedure to be in some sense “optimum.” We will say that our procedure is 
optimum if, subject to certain restrictions, it maximizes the probability of mak- 
ing the correct decision. A similar problem dealing with the means of several 
normal distributions has been studied by Paulson [1]. 

Let Dy be the decision that all k variances are equal, and let D; be the decision 
that Dy is false and a, = max (04, °++ , a4) forj = 1,2,--- ,. Our problem now 
is to find a statistical procedure for selecting one of these k + 1 decisions. 


Lin 


Let x;. denote the ath observation from the 7th population, and let #, = 

1 tia/n. Let s; = > Rell Xia — &)/(n — 1) denote the unbiased estimate of 

the variance of the ith population. We will say that II; has “slipped to the right” 
- 4 


ifop = ++) = oy = oe = + = oc andoy = d*o; where | \! > 1. Inour first 
formulation of the problem we will want to find a statistical procedure which 
will select one of the k + 1 decisions Dy , D, , --- , Dy so that (a) when all the 
variances are equal, Jy) should be selected with probability | — a@, where a is a 
small positive number fixed prior to the experiment. 


Since the class of possible decision procedures seems to be too large to admit 
an optimum solution we will impose the following restrictions which seem to be 
reasonable: (b) the procedure should be symmetric, that is, the probability of 


9 


. 2 2 2 2 22 
selecting D; when o) = ++: = oj-1 = of41 = ++: = of and oj = Xo; should be 
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the same for all 7; (c) the procedure should be invariant if all the observations 
are multiplied by the same positive constant; and (d) the procedure should be 
invariant if some constant b; is added to all the observations in the 7th popula- 
tion. We will now reformulate the problem as follows. We want a statistical pro- 
cedure for selecting one of the / + 1 decisions Dy, Dy, «++ , Dy which, subject 
to conditions (a), (b), (c), and (d) will maximize the probability of making the 
correct decision when one of the populations has slipped to the right. We shall 
prove that the optimum solution is the following: 


k 
if a/X s; > L, select Dy; 


=|) 


k 
2 8 = La select Do ’ 
~~ 


where JJ denotes the population yielding the largest sample variance. La is a 
constant whose value is determined by restriction (a). This statistic has been 
suggested, on intuitive grounds, by Cochran [2], weil 1 a good tabulation of L, 
for several values of a, n, and k is availabie [3]. 

2. Derivation of the optimum procedure. Since (#;, f2,..., Ze, 81,82,- +584) 
constitute a set of sufficient statistics for the unknown parameters (m, , m2, °°: , 
my, or , 02, °**, of) there will be no loss in considering only procedures which 
depend on this set of statistics. We can also show that any allowable procedure 


> 


will depend only on (sj , 82 


oe. gt), Let hy. -**, Zeer *** \S;) denote any 
allowable decision procedure. If 6 depends on one or more of the z;, then for 
some pair of sets (Z; id F, x), and (4, eee Fn) we have 6( i pf es Se. ***, 
st) ¥ 6(# tee ee. bos si). Now we define b; = #7 — #., and we have the 
following: 


, 9 9 


efa! - 4 2 ess’ 4 = ' 2 2 
O(2, ee ‘ o°* | &) = O( ry — by 2 9? ae b, eas oe. Sk). 
This, however, contradicts restriction (d) which states that any allowable pro- 
cedure is invariant if a constant 6, is added to each observation in the 7th popula- 
tion. Also, because of restriction (c), any allowable procedure will depend only 


on the k — 1 statistics Si =. Hee 4 és 8 a Ue = Ss. a for @ = 1, 2.-+-, 
k — land v, = o2/or fora = 1,2,---,k — 1. The joint probability density of 
Ui, U2, °*** , Ue Will depend on the eam V1, V2, °** , U1. We will let 
Do denote the decision that 1; = v2 = --- = y%,-1) = 1 and D; (i = 1,2,---,k — 1) 
denote the decision that v1) = ve = +--+ = vi = Vind = °° = MRA = 1 and 


v; = d°, and let D, be the decision that v; = m = --- »1 = 1/d*. Since any 
allowable procedure for selecting one of the set (Dy , Di, --- , Dx) will be a fune- 
tion of (%, We, +++ , U1), it can be wwansformed into a procedure for selecting 
one of the set (Dy, D,,---, Dy) by making D; correspond to D; for i = 0, 
l,---, k. Because of (a) the probability that any transformed procedure will 


select Dy when v, = v2 = ++: = %. = 1 must be 1 — a. Also, the probability 
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9 9 


that any procedure will select D, when o* = oi moos SOE, = Oi Bo = 
o, and a; = do must be equal to the probability that the trans a pro- 
cedure will select D; when D, is true. This probability must be the same for all 7. 

The proof that the indicated solution is the optimum solut’>a consists mainly 
of showing that there exists a set of nonzero a priori probabilities po, pi, -*- , 
p. Which are functions of A, so that when the decision procedure is transformed 
into a procedure for selecting one of the set (Dy, Di, «-+ , Dy) it will maximize 
the probability of making the correct decision among the set (D,, D,, --- . Dy) 
when p; is the a priori probability that D,; is true. This will be equivalent to 
showing that the procedure for selecting one of the set (D,, D,, --- , Dy) is a 
Bayes solution with respect to po, ~i, +++ , px When we introduce the loss func- 
tion W,;; = Lift # jand W;; = Oif 7 = j, where W,,; represents the loss in mak- 
ing decision D; when D; is true. Assuming that we have shown this, it follows 
that the indicated solution is the optimum solution. For suppose there exists 
another allowable procedure 6* which for some \ has a greater probability of 
making the correct decision when some population has slipped to the right. 
Then 6*, which must be a function of mw, we, ++: , u_, When transformed into 

procedure for selecting one of the set (Dy, Dy, --- . D,), will have a greater 
probability than the indicated solution of making the correct decision when D, 
is true (¢ = 1, 2,---. &) and will have, because of (a), the same probability 
when D, is true. This contradicts the fact that the indicated solution is a Bayes 
solution relative to the nonzero probabilities po(X), pi(d), +++ , De(A) since its 
Snves risk is larger than that of 6*. 

Since our procedure will depend on iw , We, +++ , Ue. , We will need to find the 
joint probability density of these random variables. It is easy to verify that this 
is given by 


9 


it 
Pee 


- 


r E =~ 9) 


Cn ae 


Let g; = g(t, Ue, °° U1 D;) be the joint probability density of wu; , v2 

v1 When PD, is true. Let po, pr, +++ , pe be a set of a priori probabilities, where 
p, is the a priori probability that D; is true. The decision procedure which maxi- 
mizes the probability of making the correct decision is the Bayes solution with 
respect to po, Pr, «++ , pe and this is known to be given by the rule: for each J, 
(j = 0,1, ---, &), select D; for all points in the m , uw, +++ , U1 space where 
PG) = MAX(Poo, Pdr, *-* , Pegs) [A]. Consider the special a priori distribution 
p (l-kp), pr = po = «+: = pe = p. We ean then calculate for each j the 
region where D; is selected. 


As an example we will compute the region where D, is selected. We must have 
g; for 7 = 2,3, ---,k, 1 and pa> — kp) do 
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Region where g, > g;. 
on tie ++ orrr 


ai CI — i. «..—— 
eens Ua — Uy (1 = 5) a | 


a=1 


[uy Un-*? we 


CS a ee 
arene] si us (1 J ) + | 


La~l 


oor Pe2\/Cs)I. 


The region where g; > g; is given by 


k—1 1 ke (n—1) /2 
| te w(1 _ =) + | 


a=1 


24 1 k(n—-1)/2 
> [> ue m (1 -4)+1] 


a=1 
k—1. 


where 


or equivalently wu > u; for j = 2,3,-:-, 
Region where gi > gx - 


[rs te > + tg? 


ws (5) b tide | 


a=l 


2 


\a—3) | 
[ui Uo°** Uni) 


die Ch. hlUCUUL ite 
(y?)@ 1) {x Ue + =| 


a=l 
Hence we must hav 


_ 1 


1 
Ua — uw (1 —- = 
a=l 


which reduces to (A° — 1)(1 — w) < 0. Since ” 
Region where pg: > (1 — kp)go . 


[uy Uo * +> Uy—r] 


| 2 te + | 


a=l 


g=C 


Thus, pg: > (1 — kp)go is equivalent with 
p 


a=l 
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This may be expressed as 


Pp . ] Uy 
(ove —(]— kp) 1 — (1 — .) eee 
> Ua + 1 


a=l 
where the left-hand side is a monotonically increasing function of 
i( k—1 \ mr bal ta “a ° s e oa > ' 
u/(d an1 Ua + 1). Therefore this region must be of the form u;/() a=1 Ua + 1) 
> L* where £* may be written as 


a “A ( Dp 2/k(n—1 | 
oe an 1 Ee fo : 
"Ts =| (1 — kp)Q2)o> ) 


Hence we would seiect D, if uw > 1, uw > u; for j = 2, 3,---,k — 1, and 
Uy Bee Ua + 1) > L£*. Similarly we can show that D, is selected for i = 
Lf s+: ££ =| Li a> ia > a; ery 12, ,t — Le +1,--- ,& = I, 
and if u;/(>-*2, ua + 1) > &*. It remains to calculate the region where D, is 
selected. This is obtained in a similar manner and the result is stated without 
derivation. Accept D, when u; <1(j = 1,2, ---,k —1)and1/ (0%) ua + 1) 
> L*, where £&* is the same constant as above. 

Hence the Bayes solution with respect to (1 — kp), p, p, «+ , p, is the follow- 
ing: for 1 < 7 S k — 1 select D; if u; > 1, and uj > max(wm, w,---, Uj1, 
Ujsa, "°°, Mea), and u; oa Ue 1) > &*. Select D, if u; < 1 for 
j= 1,2,---,k — Land 1/()0%\ ua + 1) > &*. Otherwise select Dp . 

The existence of a priori probabilities will now be shown for which the above 
procedure, with £* replaced by the fixed L, determined by condition (a), is a 
Bayes solution. Let us define the function 


4 ( = — _ — { —_ k ] “—™ de 1 ~ >a ; 
I Pp) ayo 72 1 P) | [ ( a | 


This is a continuous function of p with F(0) < 0, and F(1/k) > 0. Hence, there 
exists a p* with 0 < p* < 1/k which is a function of \” so that F(p*) = 0. To 
get the Bayes solution relative to (1 — kp*, p*, --- , p*) we merely replace £* 
by Le. 

We now substitute u. = s%/szi and the Bayes solution relative to (1 — kp*, 


p*, , p*) reduces to the following when D,; is made to correspond to D; : 


k 
if iv / > st > La select Dy, 


a=l 


k 
if sy / >. 38% < La select Do, 


a~l 


2 2 3 2 . - 
where sy = max(s; , $2, --- , s¢). Since this is an allowable procedure we have 
proved it is an optimum one. 
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AN EXTENSION OF THE BUFFON NEEDLE PROBLEM 
By NatHAN MANTEL 


° ° . ° a y . 1 
Biometrics Section, National Cancer Institute 


Bethesda, Maryland 


1. Introduction. An empirical determination of the value of * can be made 
from the relationship” 


(1) i 21,12, (3A), 


Where £ is the expected number of intersections of a group of line segments of 
total length ZL; with a group of line segments of total length Le, both groups 
being distributed over an area A. This relationship applies under the following 
conditions. 

(i) The arrangement of the two groups of line segments on the area -1 must 
be independent of each other, but the individual line segments of a group may 
have a systematic arrangement relative to each other. 

Gi) The arrangement of at least one of the two groups of line segments on 
the area -1 must be at random. The randomness must be such that the proba- 


bility of a specified point on a line segment falling into a sub-area of A is pro- 


portional to its area and the segment may assume any angle relative to some 
base line with equal probability. 

Two applications of this relationship to the estimation of 7 are considered 
below. 


2. The Buffon needle problem using a parallel line system. Consider an area 
A on which is superimposed a series of equally spaced parallel lines (without 
loss of generality we shall take the common distance between them to be units 


ou Which a straight line of length L < 1 is allowed to fall at random. At each 
tall the line must either intersect the series of parallel lines only once, or not at 
all. Thus the expected number of intersections, E, is the probability, ?, of an 
intersection occurring at a fall. And since for this system the total length of the 
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parallel lines is A, (border effects would result in a total length different from 
A, for areas with dimensions which are large relative to the distance between 
parallel lines these border effects would be trivial.) P = 21,2. Thus from an 
empirical determination of the probability, /’, can be made an empirical de- 
termination, 


(2) # = 2L/P 


This determination is subject to sampling variation. Taking into account that, 
on the basis of NV falls, the standard error of P is ~/ P( 1—P)/N it follows that, 
asymptotically, 

(3) SE# = wV (4 — 2L)/2LN. 

This formula indicates that more precise estimates of 7 can be made by 
using 2 longe: line relative to the spacing of the parallel lines. 

Various empirical determinations of the value of + have been made and 
published making use of the foregoing relationship, the experimental results 
serving simultaneously as an empirical demonstration of the correctness of 
Bernoulli’s theorem. Curiously enough, virtually all the results published haves 
been closer to the expected value than should be expected, with some signifi- 
cantly too close. Apparently only those experiments which gave good results 
have been published. However, one example in the literature gives patent 
indication of having been terminated when the results obtained were good. 
Thus Lazzerini’s experiment in 1901 with 3,408 falls provided an estimate of 7 
equal to 3.1415929, having an error of only 0.0000003. Terminating the experi- 
ment one fall sooner or later would inevitably have lost half the decimal places 
of accuracy. 


3. The Cartesian grid system. Consider an area A on which are superimposed 
two series of equally unit-spaced parallel lines, the two series being at right 
angles to each other. The expected number of intersections with this system of 
a straight line of length Z falling at random is 4/7 for all values of L. The 
estimate of x yielded by N falls with an empirical average of @ intersections per 
fall is 
(+) & = 4L/é 


with standard error of estimate 


TC¢ 


= 4LVN 


where o, is the standard deviation of the number of intersections at a fall. This 


standard deviation can be evaluated either theoretically or empirically for any 
value of L. 


The theoretical evaluation of o. for large L is of interest. Consider an L so 
large (L > 1) that certain marginal effects can be disregarded. (These marginal 
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effects arise from the actual location of the end of the line within the square in 
which it falls, they would slightly increase the value of o2 over what is shown 
here, but would have no effect on E(c).) For any given angle 6, at which the line 
falls, there would be L | sin @ | vertical intersections and ZL | cos @ | horizontal 
intersections. Then the expected number of intersections is given by 


9 
(6) U(c) _ L(sin 6 + cos 6) d@ = 4L/r. 


The expected square of the number of intersections is given by 
— 


E(c) } L(sin 6 -+ eos 6) dé = L- (1 a =). 


“0 T 


These yield 


(8) ©=E(ce)-E(O =L (1 43 


> 
r 


and substituting in (5) 


(9) SE# = rV (x? + 2x — 16)/16N 
for large L 

The quantity inside the square root sign is numerically equal to .0095/N. 
This compares with the value for the quantity inside the square root sign for 
(3) of .5708/N for L = 1 (the most efficient value of L for that situation) and 
would indicate that more information about the value of a is yielded by a 
single fall in this system (with large L) than by 60 falls in the parallel line 
system with L = 1. 


4. An alternative estimate. The preceding section has covered the estimation 
of + from the average number of intersections per fall. Equation (8) would 
suggest that an estimate can be made from the variation in number of inter- 
sections from fall to fall. Let V = 62/L’, where é. is the sample standard devia- 
tion of intersections per fall. Then equation (8) vields as an estimate of 7 the 
solution to (1 — Vj)X° + 2N — 16 = O, 


(10) 


How good are estimates so obtained? For any sample, V must lie between 
0 (all falls give same number of intersections) and 14 (3 — 272) (half the 


falls parallel or perpendicular to system, remainder at an angle 7/4 or 32/4; 
that is, half the lines have the minimum number of intersections and half have 
the maximum number). But corresponding to V = 0, # = 3.1231, and corre- 
sponding toV = 14(3 — 22), # = 3.1752. This can be considered a demonstra- 
tion that + must lie between 3.1231 and 3.1752, and indicates that the procedure 
will give satisfactory estimates. 
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Table I gives approximate 90 per cent probability limits for estimates of 7 
based on 101 falls by the three methods considered. For estimates based on the 
average number of intersections the limits are given by + 1.645 o¢. For the 
estimate based on variation in the number of intersections, the limits are the 
estimates corresponding to V = (1/1.28)(1 + 2/r — 16/x°) = .0121 and 
V = 1.24(1 + 2/e — 16/x”) = .0192 where 1/1.28 and 1.24 are the 5th and 
95th percentiles of the distribution of Fjo0,. respectively. 


TABLE I 
Estimates of x based on 101 falls; 90% probability limits 


Buffon needle case 
L = 2.75 to 3.53 
Cartesian grid system 
Mean number of intersections 3.09 to 3.19 
Variation in number of intersections. ........ 3.138 to 3.146 


The estimate based on variation in the number of intersections is relatively 
insensitive to counting and measurement errors. Thus a 10 per cent error in 
measuring L will produce only 14 9 of 1 per cent error in the estimate of x. A similar 
error in measuring L will produce a 10 per cent error in the estimate of x by the 
other methods. It should be remarked that the situation here is unusual in that 
the sample variance provides a much better estimate of the true mean number 
of intersections than does the sample mean. This is in contrast with the case of the 
Poisson distribution for which the sample mean provides the best estimate of 
the population variance. 
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A HIGHER ORDER COMPLETE CLASS THEOREM' 
By LioneL WEISS 
University of Virginia and Cornell University 
1. Introduction. The purpose of this note is to show that one can prove com- 


plete class theorems in which the risk for each possible distribution is not only a 
scalar, as is usual in the Wald theory, but actually a vector with as many com- 
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ponents as desired. The proof is an almost trivial reformulation of the proof of 
Theorem | in [1]. Other known results may be extended in a similar manner. 


An example of the application of the theorem will be found in Section 3. This 


is a complete class theorem which requires no assignment of a loss (weight) 
function, but only the classification of decisions into two classes, favorable and 
unfavorable, for each distribution function. This result may be satisfactory to 
those who maintain that in some or many situations the assignment of a loss 
function is difficult. 


2. Complete class theorems when the risk is a vector. Let x be the generic 
point of a Euclidean space Z (the extension of the results of this paper to general 
abstract spaces is trivial). F(x), Fe(x), «++ , Pm(x) are m (>1) given cumulative 
probability distributions on Z. The statistician is presented with an observation 
on the chance variable X which is distributed in Z according to an unknown one 
of F;,---, Ff. On the basis of this observation he has to make one of L de- 
cisions, say d,,--- ,d,. Let s be a positive integer and W,;,(2) (¢ = 1, «++ , m; 
J=1,---,L;k = 1,--+ ,s) be measurable functions of « such that 


| Wiir(a) |dFi << @. 
A randomized decision function, hereafter called “test”? for short, and generi- 
cally designated by n(x), is defined as follows: n(x) = [m(x), --+ , nz(2)], where 
(a) n(a) is defined for all x, 
(b) O S n,(x) forj = 1,---,L, 
) 2 jut nj(a) = 1 identically in x, 
(d) »)(x) is measurable for j = 1, +--+, L. 


(¢ 


- 


Let ri. = | (5 17;(2)W4(2)) dF; and r° = (ri), @ = 1,°---,m:k 


~Z 

-, 8). Thus to each test n(x) there corresponds the sth order risk point r’. 
The test 7 with sth order risk point r° will be said to be uniformly better (s 
than the test 7’ with sth order risk point r“ = (ry) if ra S ry for every 7 and k, 
with the inequality sign holding for at least one pair (7, k). A test 7 will be called 
admissible (s) if there exists no test uniformly better (s) than 7. A class C of 
tests will be called complete (s) if, for any test 7” not in C, there exists a test 7 
in C which is uniformly better (s) than 7’. A complete (s) class will be called 
minimal if no proper subclass of it is complete (s). 

Wald’s proof given in [1] obviously holds and we may state: The class of all 
admissible (s) tests is a minimal complete (s) class. 

Any set & = (&)), @ = 1,---,m;k = 1,--+-, s) of nonnegative numbers 
which add to unity (a convenient normalization) will be called an a priori dis- 
tribution (s). A Bayes solution (s) with respect to — is a test 7* which minimizes 


(1) > tara(T) 


with respect to all tests 7’. 





HIGHER ORDER COMPLETE CLASS THEOREM 679 


THEOREM. Every admissible (s) test ts a Bayes solution (s) with respect to som 
a priori distribution (s). Hence the class of Bayes solutions (s) is complete (s). 

Proor. Let {Gu(x)i (@ = 1,---,m;k = 1,--- , 8s) be m-s distribution func- 
tions on Z, and let Gy(x) = F(x) for every ¢ and k. Suppose the statistician has 
to make one of a set of ZL decisions which we may call d; , +--+ , d,. Let Wy j,(x) 
be the loss incurred when x is the observed point, G,(2) is the distribution fune- 
tion of XV, and the decision 7; is made. Let r’*() be the first order risk point of 
this problem for the decision function (x). Then r°(n) = r’*(n). The desired 
result now follows from Theorem 1 of [1], for the requirement made there that 
the F’; are distinct is never used. 


Let f; be the density function of F, with respect to a measure uw with respect 


to which all F; are absolutely continuous. There is always such a measure. To 


construct an sth order Bayes solution with respect to & = (&,) (¢ = 1,--- ,m; 
k = ], -++-, 8) one may proceed as follows: y,{r) = O for all 7 JM ies, L 

for which > 1 De Ext (x)W (2) is not a minimum with respect to 7; 7)(. 

is defined arbitrarily between zero and one, inclusive, for all other j, provided 
only that every component of the resulting n(x) is measurable and the sum is 
always one. 

Other results found in {tj and elsewhere may be extended in a manner similar 
to that of the present theorem. 

It is also obvious that one may prove similar results with the inequality signs 
reversed, by using anti-Bayes solutions (s), that is, tests which maximize (1 
instead of minimizing it. 

Using the ideas of [2] the above results may easily be extended to the case 
where the number of distributions and/or decisions is infinite and where the 
observations are taken sequentially, to obtain «-complete (s) theorems. 


3. Application to controlling probabilities of making the various decisions. 
P(i, 7, T) will denote the probability of making decision ¢; when F; is actually 
the distribution and the test 7 is employed. In other respects the notation of 
Section 2 is used. For each 7, we suppose that there are certain decisions which are 
favorable (i.e., we prefer to make them when F; is the distribution), and the 
others are unfavorable (we prefer not to make them when F;, is the distribution 
We assume that for each 7 there is at least one favorable and one untavorable 
decision. For our present purpose, s is equal to L, the number of decisions. For 
given ? and k, we define W (7) as follows. If d, is favorable relative to F;, , 
W (x Oif7 =k, Wie) = Lif 7 A k. If d, is unfavorable relative to F; , 
Wola) = lity = kh, We) = Oif 7¥ kh. Then we have the following result. 
Let 7 be any test which is not a Bayes solution (s). There is a Bayes solution 
(s), T’, such that for any 7 (7 = 1, +--+. m) and any / such that d; is unfavorable 
relative to F;, we have P(i, 7 7’) s P(i, 7 T); while for any 7 and any 7’ such 
that d;, is favorable relative to F;, we have P(i, j’ T’) 2 PG, 7’ T). The 


inequality sign holds in at least two of these relations, one in each set. 
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4. Other applications. We mention two other applications of the results. If 
there are s individuals with possibly different loss functions, W;;,(2) can denote 
the loss suffered by individual k when d; is made and F; is true and x is observed. 
Or different true situations may lead to the same distribution of the observable 
chance variable, so that W;,(x) is the loss incurred under the ‘th true situation 
leading to the distribution F;. The range of k may depend upon 7, and all the 
results hold. 
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———— 2 
CORRECTION OF A PROOF* 
By J. Kirrer 


Cornell University 


In the proof of Theorem 3 of “On Wald’s Complete Class Theorems” (Ann. 
Math. Stat., Vol. 24 (1953), pp. 70-75), the inequality appearing in the definition 


of re,m(&) should be altered to read r(é, 6") 2 r(, 62) — €/2; the remainder of 
the proof is then easily altered to give the desired result. Without the «/2, one 
would still have to prove that the space D is large enough togive limm—,r2,m(t) < 
vc. The author is indebted to Mr. Jerome Sacks for pointing out this fact. 
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ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the Stanford meeting of the Institute, 
June 19-20, 1953) 


1. On the Probability Function of the Quotient of Sample Ranges from a Rec- 
tangular Distribution. Leo A. Arotan, Hughes Aircraft and Development 
Laboratories, Culver City. 


In a recent paper Paul R. Rider (J. Amer. Stat. Assn., Vol. 46 (1951), pp. 502-507) has 
derived the probability function of u = R,/R: , the quotient of the sample ranges of two 
independent random samples from f(z) = 1/z for 0 S z S 2, f(x) = 0 elsewhere, where 
R; is the sample range in a sample of m and R; is the sample range in a sample of n from 
f(x). The power function of the test is derived, the tables are extended for the 5 per cent, 
23 per cent, 1 per cent, and 4 per cent levels of significance. In case m and n large a Cornish- 
Fisher expansion for the levels of significance is derived. The transformation w = 4 log, u 
is found convenient and use is made of the moment generating function of w to find the 
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cumulants of w, which are needed in the Cornish-Fisher expansion. Limiting distributions 
are given for m large, n —~ ~ and vice versa. 


2. Actuarial Validity of the Binomial Distribution for Large Numbers of Lives 
with Small Mortality Probabilities. Joun E. Wausn, U.S. Naval Ordnance 
Test Station, China Lake. 


In actuarial work, one of the most widely used probability tools is the binomial dis- 
tribution. Sufficient conditions for the validity of the binomial distribution for a group of 
lives observed during some time or age interval are: (a) The probability of death within 
the interval considered is the same for each person of the group. (b) Thelives are statistically 
independent with respect to mortality. Then the deaths occurring in this group during 
the observation period have a binomial probability distribution. For practical situations, 
these conditions are never exactly satisfied. Condition (a) is undoubtedly violated ap- 
preciably for many groups of lives. The close association between friends, relatives, and 
neighbors indicates that condition (b) also may be noticeably violated. Thus use of the 
binomial distribution in actuarial work might seem a very questionable procedure. This 
note investigates the applicability of the binomial distribution for a situation which appears 
to be of common actuarial occurrence and where both (a) and (b) can be noticeably violated. 
The binomial distribution is found to yield reasonably accurate probabilities for the case 
of a large number of lives if the average death probability is used in place of the common 
mortality value specified by (a). 


3. On the Distribution of the Likelihood Ratio. Herman Cuernorr, Stanford 
University. 


A classical result on the distribution of the likelihood ratio \ is the following. Under 
suitable regularity conditions, if the hypothesis that a parameter @ lies on an r-dimen- 
sional hyperplane of k-dimensional space is true, the distribution of —2 log \ is asymptoti- 
cally that of chi square with k — r degrees of freedom. On the basis of n independent 
observations, let \ be computed for a test of the hypothesis that 6 lies in w, against the alter- 
native that @ lies in + where w and + are disjoint subsets of k-dimensional space. Let the 
origin be a limit point of both w and 7, and let J represent the information matrix per 
observation at @ = 0. If w and r may be suitably approximated near the origin by positively 
homogeneous sets, the asymptotic distribution of --2 log \ when @ = 0 is the same as for 
the problem where the observations have a joint normal distribution with mean @ and 
covariance matrix J~!, and w and 7 are replaced by their approximations. 


4. Testing the Approximate Validity of Statistical Hypotheses. J. L. Hopces, Jr. 
AND Ericu L. LEHMANN, University of California, Berkeley. 


A statistical hypothesis H, in the customary formulation, is frequently known a priori 
not to be exactly true. A much discussed example of this situation is the problem of testing 
for normality. The large-sample paradoxes inherent in such a formulation (see for example 
Berkson, J. Amer. Stat. Assn., Vol. 33 (1938), p. 526) may be avoided by testing instead 
the hypothesis H’ that H is approximately valid. The chi square test of goodness of fit is 
modified to provide a large-sample test of H’. A number of related small-sample parametric 
problems are also treated. For example, a strictly unbiased test is found for the hypothesis 
that the mean ¢ of a normal population of unknown variance, differs from its hypothetical 
value by not more than a stated amount 6. 


5. Distribution of Correlated Means. D.S. Vituars, U.S. Naval Ordnance Test 
Station, China Lake. 


The maximum likelihood estimate of a population mean as computed from a set of sample 
means correlated by a known amount, turns out to be a weighted average with definite 
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weighting coefficients. These coefficients have been worked out for sets from four to twelve 
and correlations corresponding to degrees of overlap of one third, one half, two thirds and 
three quarters. The weighted sum of squares and products of deviations of correlated means 
n X}.; Ati(x; — #)(x; — #) is shown to be distributed as chi square with degrees of freedom 
equal to p — 1. These results are applicable to the distribution of moving averages on the 
null hypothesis of zero trend with time. 


6. On the Detection of Sure Signals in Noise. R. C. Davis, U. 8. Naval Ord- 
nance Test Station, Pasadena Annex. 


The purpose of this paper is to simplify and extend known results on the detection of 
sure signals in the presence of noise. For a sure signal; that is, of completely specified form 
and nonrandom, upon which a background noise is superimposed linearly, several criteria 
for an optimum pre-detection filter are compared. The background noise may be any 
continuous stochastic process possessing mean value zero and a known continuous 
covariance function. Specifically, it is shown that when the input noise to a predetection 
filter is Gaussian, the filter which maximizes—for a fixed false alarm probability—the 
probability of detecting the signal when present is identical with the linear filter which 
maximizes the o1tput signal-to-noise ratio. An explicit expression is obtained for the 
probability of detection. The stability of the optimum filter is discussed in some detail. 
Finally it is shown how an optimum signal shape of given energy content can be chosen. 


7. A Statistic Associated with the Joint Distribution of n Successive Ampli- 
tudes. Preliminary Report. Wittiam C. Horrman, U.S. Navy Electronics 
Laboratory, San Diego. 


In previous work the joint distribution of the n random variables R(t;) = 
{X2(t;) + ¥2(t;)}4, G = 1,2, --- , n), was found for X(t;), Y(t;) from a stationary Gaussian 
process. A statistic q = 2n7! ZT; r, is now defined, and its characteristic function and small 
sample distribution determined. The statistic g, which is an estimate of the parameter o?, 
is shown to be unbiased, consistent, and in the case of simple Markov dependence, asymp- 
totically efficient in the strict sense. It is also shown that q is asymptotical!y normal. 
I:xpressions for maximum likelihood estimates are found for the case of simple Markov 
dependence, and é is shown to be asymptotically equivalent to q in probability. A test of 
independence versus simple Markov dependence is given. 


8. Some Two-Sample Tests Based on a Particular Measure of Discrepancy. 
Lovis H. Wreener, University of Oregon. 


Let F and G be continuous cumulative probability functions. The quantity 6(F, G) = 
§*, (F — G)? dG is a measure of discrepancy between F and G and is such that @(F, G) = 0 
if and only if F = G. E. Lehmann has proposed a distribution-free statistic which is the 
minimum variance unbiased estimate of the functional ¢(F, G) = 4 + @(F, G) + 6(G, F). 
Three other distribution-free statistics based on 6(F, G) are 0(F*, G*), o(F*,G@*) + 0(G*, F*), 
and $¢(F*, G*), where F* and G* are the corresponding sample cumulative probability 
functions, The above four statistics, minus their expected values and multiplied by suitable 
functions of the sample sizes, are shown to have the same asymptotic distribution. Under 
Hy: F = G, the asymptotic distribution is the same as that of the von Mises statistic 
Under H, : F # G, the asymptotic distribution is normal provided F and G are restricted 
slightly. The tests based on large values of these statistics are shown to be consistent tests 
of H, against H, . An example showing that these tests are not in general unbiased is given. 
The variance of Lehmann’s statistic is found in terms of F and G and the power of the 
corresponding test is investigated for the alternatives G = F? and G = F°, 
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9. Confidence Intervals for a Proportion. Preliminary Report. E. L. Crow, U.S. 
Naval Ordnance Test Station, China Lake. 


Certain direct modifications of the Clopper and Pearson confidence intervals for a 
binomial proportion p are discussed. The sample information consists of the number r of 
items with a stated characteristic in a random sample of size n. To obtain confidence inter- 
vals with confidence coefficient at least 1 — a, Clopper and Pearson determined, for each 
given p, an interval of values of r by excluding at each end of the distribution values of r 
with probability mass totaling not more than a/2. If the excluded probability mass a ‘is 
not so restricted, then generally shorter intervals for r, and correspondingly for p, are 
obtaineu. One modification consists of following Clopper and Pearson except that all the 
probability mass a is removed at one end of the distribution of r when none can be removed 
at the other. Another modification, proposed by Theodore E. Sterne, consists of excluding 
from the interval of r for each given p those values of r having the smallest probabilities. 
For 1 — a = 0.90, 0.95 and 0.99 and n = 1, 2, ---, 20, confidence intervals for p based on 
these two modifications are tabulated and compared. 


10. On Estimating Both Mean and Standard Deviation of a Normal Population 
from the Lowest r out of n Observations. Joun V. BreakweLi, North 
American Aviation Company, Los Angeles. 


Maximum likelihood estimates 4 and ¢ of both mean and standard deviation are obtain- 
able from the solution of a transcendental equation involving the ratio r/n and the ratio 
D/S, where D is the difference between the mean of and the highest of the lowest r observa- 
tions, while s is the standard deviation of these r observations. The asymptotically bi- 
variate normal distribution of +/n(& — uw) and +/n(é — c) is investigated, the elements of 
the covariance matrix being decreasing functions of the ratio r/n. The biases in both g@ 
and ¢@ are negative, of order 1/n, and are certain numerically decreasing functions of the 
ratio r/n. 


11. Strong Consistency of Stochastic Approximation Methods. Jutius R. Bium, 
University of California, Berkeley. 


Robbins and Monro have constructed a stochastic approximation scheme which estimates 
consistently the root of a regression equation. Similarly Kiefer and Wolfowitz have pro- 
posed a consistent sequence of estimates for the point where an unknown regression func- 
tion achieves its maximum. It is shown that both of these schemes have the property of 
strong consistency, under somewhat weaker restrictions. A semimartingale theorem due to 
Doob is generalized and applied to prove strong convergence of a certain sequence of ran- 
dom vectors. This is applied to problems of solving k regression equations in k unknowns 
and of estimating the point where a regression function in k variables achieves its maximum. 


12. Some Probability Results for Mortality Rates Based on Insurance Data. 
Joun E. Watsu, U.S. Naval Ordnance Test Station, China Lake. 


This paper considers the probability distribution of an observed mortality rate based 
on a large amount of insurance data. A computationally feasible method of obtaining 
significance tests and confidence intervals for the ‘‘true’’ mortality rate estimated by the 
observed rate is presented. It is only necessary to know the value of the observed rate, the 
number of units (policies, amounts, ete.) exposed to risk, and the number of units associated 
with each person who died during his observation period. In the derivations the lives are 
assumed to be statistically independent but need not have the same mortality rate nor be 
observed during the same period. The tests and confidence intervals obtained are nearly 
100 per cent efficient. A generalization of the basic technique is used to derive the proba- 
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bility distribution of actuarial cost functions and other quantities of interest. A method 
whereby past data may be used as a help in estimating the future probability distribution 
of these actuarial functions is outlined. 


13. Extensions of the ('-Test to Three Populations. Louis H. WreGner, Univer- 
sity of Oregon. 


Let F, G, and H be continuous cumulative probability functions. Denote the classes of 
triples (F, G, H) such that (1) F = G = H, (2) F < G@ < H, and (3) F $ G Ss H (where at 
least one of the inequality signs holds) by C; , C2 , and C; respectively. Two extensions of 
the Wilcoxen-Mann-Whitney U-test are proposed and are shown to be consistent and 
unbiased tests of C; against C; . The test statistics are shown to be asymptotically normal 
under C, and also under C; provided that for one of the statistics C; is slightly restricted. 
Certain moments are found in terms of F, G, and H. Finally, it is shown that a test of C, 
against C2 proposed by D. R. Whitney is unbiased against C; , consistent against C2 , but 
not consistent against C 


14. Normal Regression Theory and Some Classical Statistics in Multivariate 
Analysis. Junsrro OGawa, Osaka University. 


The purpose of this paper is to give a new method of derivation of the sampling distribu- 
tions of the multiple correlation coefficient and the Hotelling’s squared generalized Stu- 
dent’s ratio T?. Although the results here presented are well known and now are classical, 
there seems to be some interest from the methodological point of view. The fundamental idea 
of this paper was suggested by Prof. G. Elfving’s 1947 paper (G. Elfving, “‘A simple method 
of deducing certain distributions connected with multivariate sampling,’’ Skand. Aktu- 
arietids., Vol. 29-30 (1947), pp. 56-74). In this paper Prof. G. Elfving had attempted the 
systematic derivation of the sampling distributions of the classical statistics in multi- 
variate analysis utilizing the geometrical interpretations of the results of the normal 
regression theory, but with repect to the two statistics mentioned above, he succeeded in 
deriving their sampling distributions only in the null cases. Here we shall show that our 
method gives their sampling distributions in general cases. In this connection, we had to 
describe the main results of the normal regression theory somewhat more precisely than 
those which are seen in the literatures, at least as far as the writer knows. 


15. The Use of Maximum Likelihood Estimates in Chi Square Tests of Good- 
ness of Fit. HERMAN CHERNOFF AND Ericu L. LEHMANN, Stanford Uni- 
versity and University of California, Berkeley. 


Consider the problem of testing that a sample comes from a distribution of given form. 
The test is performed by counting the number of observations falling into specified cells 
and applying the x? test to these frequencies. In estimating the parameters for this test 
one may use the maximum likelihood (or asymptotically equivalent) estimates based (1) 
on the cell frequencies or (2) on the original observations. It is pointed out that in (2) 
(unlike the well known result for (1)) the test statistic does not have a limiting x?-distribu- 
tion, but that it is stochastically larger than would be expected under the x? theory. The 
limiting distribution is obtained and some examples are computed. These indicate that the 
error is not serious in the case of fitting a Poisson distribution, but may be so for the fitting 
of a normal. 


16. On the Treatment of Ties in Nonparametric Tests. Josern Purrer, Uni- 
versity of California, Berkeley. 


In applying rank tests to tied observations, two alternative procedures are customarily 
used: either the tied observations are ‘‘randomized,” that is, ordered in a way depending 
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on the outcome of an additional random experiment, or the definition of the test statistic 
is appropriately extended to cover the tied case. For (1) the Wilcoxon two-sample test and 
(2) the sign test, the asymptotic distributions of the statistics concerned are derived. The 
performances of the two alternative procedures are then compared, using Pitman’s concept 
of asymptotic relative efficiency (cf. Noether, Ann. Math. Stat. Vol 21 (1950), p. 241). 
In both cases, the ‘“‘randomized”’ test is proved to be less efficient than the ‘‘nonrandomized”’ 
test given by the modified test statistic. In (1), the modified test statistic is essentially the 
one suggested by Kruskal and Wallis (Ann. Math. Stat., Vol 23 (1952), p. 538); the asymptotic 
relative efficiency of the randomized test with respect to the nonrandomized test is 1 — Z pi, 
where p; are the jumps of the relevant underlying distribution at its discontinuities. In 
(2), the nonrandomized test consists essentially of ignoring the zero differences 7; — y; ; 
the analogous asymptotic relative efficiency is 1 — po , where po = P(X; = Y,). (Research 
sponsored by the Bureau of Naval Research.) 


17. Asymptotic Relative Efficiency of Some Rank Tests for Analysis of Variance 
Problems. F. C. Anprews, Stanford University. 


Let |Xi;; 7 = 1, 2, ---,¢; 7 = 1, 2, --- , ns} be independent random variables with 
F(x) the continuous distribution function of X,; , n; = syn. Several nonparametric tests 
for the hypothesis fF; = Ff, = --- = F. have been proposed. To study the asymptotic be- 
havior of two of these tests against translation differences, the sequence of alternative 
hypotheses Fy(z) = F(x + @:)/\/n), i = 1, 2, +++, ¢, Din (0; — 86)? > O is assumed. With 
mild assumptions on F, for this type of alternative hypothesis, the limiting distribution 
as n — « of the Wallis-Kruskal H statistic is shown to be x2-1(A”), (noncentral x? with 
ce — 1 degrees of freedom and noncentral parameter A”), 4 = 12(§t% (F’(x))? dz]?- 


Sf18i(0; — 6)?; for that of the Mood-Brown median statistic x21 (AM) with \¥ = 


4(F'(a))?- Df; 8:(@; — 8)?, F(a) = 3. These results are used to determine the asymptotic 
relative efficiency (a.r.e.) of the median test:.with respect to the H-test which is 
MF’ (a) SE (F'(x)]? dx}? and the a.r.e. of the H-test with respect to the classical F test 
which is 12[o¢§7S (F’(x))* dx}?, of is the variance of the d.f.F. These last a.r.e. results are 
independent of the power, level of significance, and c, and so agree with the known results 
in the two sample case. 


18. Application of the Studentized Maximum Chi-Distribution. Preliminary 
Report. T. A. Jeeves, University of California, Berkeley. 


If U is the maximum of X, , XY. , +--+ X,, where X; has a chi-distribution with m degrees 
of freedom, and n}Y has a chi-distribution with n degrees of freedom. then the distribution 
of Z = U/Y is termed the studentized maximum chi-distribution. This statistic can be 
applied to obtain confidence bands in problems of multiple regression and analysis of 
covariance. For the comparison of many regression lines (hyperplanes), the bands so ob- 
tained are the Neyman-narrowest bands about each line (hyperplane). With a slight modi- 
fication the variance need not be assumed the same about each line. This statistic has also 
been applied to obtain bands which are Neyman-shortest about each coefficient of the 
regression hyperplane and to develop families of bands with flat boundaries. These later 
can be used (i) in their own right (ii) as an approximation to the Neyman-narrowest bands, 
or (iii) to facilitate construction of the later bands. Asymptotic expressions for the dis- 
tribution have been obtained and short tables prepared. 
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(Abstracts of papers presented at the Kingston meeting of the Institute, 


/ 


August 31-September 4, 1953) 


19. Sequential Probability Ratio Confidence Sets. (Preliminary Report.) 
ALLAN Brrnspaum, Columbia University. 


Let x = (x, , 2, +++) denote a sequence of observed values of a random variable dis- 
tributed according to f(x, 6). Let 7'(6’) denote the sequential probability ratio test of the 
hypothesis H(@’): 6 = 6’ against the hypothesis H(@’ + A): 6 = 6’ + A, at size @ and power 
1 — 8, A > O, with operating characteristic denoted by Ls (8), for 6’, 6 + A and @in Q, 
with @ a (possibly infinite) interval. Let @”’(x) = inf {6 | 7(6’) accepts /(6’) when z is 
observed}. Under general conditions, Ls-(@) is monotone for all 6’ and @”(z) is convex for 
all x. Let m(z, 6’) be the number of observations required for 7(6’) to terminate when the 
sequence z is observed. Let n(z) = sup {m(x, 6’) | 6’ € Q); n(x) and 6"(x) are functions of 
the first n(x) components of x only. n(X) is finite with probability one. If @ is the true 
parameter value, then Prje”(X) S 6} = 1 — a and Prje”’(X) S @ — A} = 8B. Hence the 
assertion ](X): ‘‘@”(X) = 6< 6” (X) + A’ will be true with probability 1 — a — 8. The 
method has the advantages: (a) that it constructs confidence intervals of prescribed length 
A and confidence coefficient 1 — a — 8 without need to consider the distribution of any 
statistics, and (b) that it is applicable to certain problems for which there seem to be no 
alternative methods available. Application of the method to sequential tests of composite 
hypotheses is being studied. 


20. Optimum Sample Size for Choosing the Population Having the Smaller 
Variance. Pau N. SomERVILLE, University of North Carolina. 


Assume we have k + 1 populations, normally distributed and with variances 1 S @, S 
< 6, . Let it be required to select N individuals from one of the populations, where 


those individuals that differ by more than an amount d from the mean are rejected, and 


where the loss involved in rejecting an individual is 7. Suppose we take a preliminary sample 
of size n + 1 from each population, and select the population having the smallest sample 
variance for the selection of the N individuals. Let the cost of the preliminary sample be 
ci n + co . If we use the sample size which minimizes the maximum expected loss, then the 
‘‘optimum”’ sample size is an increasing function of V,/¢; provided N,/c; is sufficient large 
with respect to co that it is profitable to sample. Tables giving ‘‘optimum”’ sample sizes for 
d= 1,2,3 fork =1,k = 2, for various values of V,/c; are given. (This research was sup- 
ported in part by the United States Air Force under Contract AF 18(600)-83. 


21. The Generation of Pseudo-Random Numbers on a Decimal Calculator- 
Jack MosuMan, Oak Ridge National Laboratory. 


Let p= 7iK+1 and define po = i. The digits of the SeqQuence pj 41 pi'p (mod 10°), under 
certain conditions, fulfill specified tests for randomness and provide 5-10*-3(s > 4) decimal 
numbers before repetition of the basic cycle. It is found that the five last significant digits 
should be omitted from the number before application and there is an uncertainty about 
the sixth. Relevant theorems from number theory are cited and experimental values of 
x? for various tests are displayed for 10,000 generated numbers. 


22. The Integral Solution of Pearson’s Random Walk Problem and Related 
Matters. Davin Duranp anv J. ARTHUR GREENWOOD, National Bureau 


of Economic Research and Manhattan Life Insurance Company. 


Von Mises has shown that the vector mean of a sample of n from the population dp = 
(2 Jo(ik))-!-exp(k cos x)-dx furnishes a significance test for the hypothesis k = 0. This 
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vector mean, multiplied by n, has the distribution found by Kluyver in 1906 as a solution 
to Pearson’s random walk problem. Kluyver’s distribution is computed by quadratures 
and tabled, for n = 6(1)24. Two series expansions of the distribution are considered; first, 
an expansion in Laguerre functions essentially due to Pearson, and, second, an expansion 
in descending powers of n. For n = 7, 14, 21, the goodness of approximation of these series 
is compared. For use in significance tests, the 5 per cent and 1 per cent tail-area points of 
the distribution are tabled. The expansion in descending powers of n is inverted for use in 
extending the table of percentage points. 


23. On Optimal Systems. Davin BLackweLi, Howard University. 


For any sequence 2,, 22,--: of chance variables satisfying |2z,| < 1 and 
E(atpy | ty +++ 5 2n-1) S -umax ( 2, | | M1, +++ , Za-1), Where wis a fixed constant,0 < u< 
1,Prj2+--- +2, 2 tforsome n 2 0} S [(1 — u)/(1 + u)]! for all t = 0, with equality, 
for integral ¢, when the z, are independent and Pr |z, = +1} = (1 ¥ u)/2. This result has 
a simple interpretation in terms of gambling systems; a corollary is that for any chance 
variables 2; , 22, °:: satisfying = 2, S$ 1 and E(,i2, +--+; Za, 0, Pr 
{| mn (a, + --- +2,) | 2 eforsomen 2 Nj S (1 + €)-'*/"?*®, yielding Lévy’s result that 
n— (2, + +++ + 2) — 0 with probability one. 


24. Maximum Likelihood Regression Equations. H. Leon Harter, Wright- 
Patterson Air Force Base. 


Consider the application of the principle of maximum likelihood to the problem of de- 
termining the regression equation of one variable on p others. For a normal distribution 
of residuals, the maximum likelihood solution is the familiar least squares solution, found 
by minimizing the sum of squares of the residuals. For a Laplace distribution of residuals, 
the maximum likelihood solution is found by minimizing the sum of the absolute values of 
the residuals. For distributions of residuals with finite limits, only certain solutions are 
admissible. For the truncated normal distribution, the maximum likelihood solution is 
found by minimizing the sum of squares of residuals for the set of admissible solutions. For 
the truncated Laplace distribution, the maximum likelihood solution is found by minimizing 
the sum of absolute values of the residuals for the set of admissible solutions. For a rec- 
tangular distribution of residuals, the likelihood function is a constant, and there is no 
unique maximum likelihood solution, one admissible solution being just as likely as another 


25. Spherical Distributions. (Preliminary Report.) G. E. P. Box, Imperial 
Chemical Industries, Blackley, Manchester, England and North Carolina 
State College. 


A wide class of test criteria, including the ¢ test, analysis of variance test, Bartlett test, 
and tests of normality are independent of scale. Their null distributions are usually derived 
on the assumptions of independence, normality and equality of variance. Such test criteria 
follow the same null distribution and consequently the tests are equally valid under the 
less stringent conditions that the observations y; , y2 , --- , Yn follow what may be called a 
“spherical” distribution, that is the contours of the joint density functions are spheres 


a P(r» Yrs *** y Yn) = Af(ZE y?) Oo< t¥<L 


where L may be infinite and k is chosen so that the integral taken over the whole space 
is unity. The condition for validity is necessary as well as sufficient. The y’s would not be 
independent except with the normal density function, (J. Clerk Maxwell, Philos. Mag., 
Vol. 19 (1860), p. 19; M.S. Bartlett, ‘““The vector representation of a sample,’’ Proc. Cam- 
bridge Philos. Soc., Vol. 30 (1934), pp. 327-340), nevertheless spherical distributions are 
important because (i) The spherical distribution, but not necessarily the normal distribu- 
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tion, is an approximation to the distribution generated by standard randomization pro- 
cedures, (ii) The spherical distribution is generated exactly by the process of angular 
randomization, which can be used with certain multi-factor designs. (G. E. P. Box, ‘‘Multi- 
factor designs of first order,’’ Biometrika, Vol. 39 (1952), pp. 49-57.) (iii) Using a parent 
spherical distribution certain distribution problems may be attacked from a novel and use- 
ful angle. When the null-hypothesis is not true, the power of the test criterion is not inde- 
pendent of the function f chosen. However tests which are U.M.P. on the usual assumptions 
are also U.M.P. for any spherical distribution in which f is a decreasing function. 


26. On the Monotonic Character of the Power of a Certain Test in Multivariate 
Analysis of Variance. S. N. Roy, University of North Carolina. 


A test of the hypothesis H) of equality of means for k p-variate normal populations 
(assumed to have the same dispersion matrix =) has been put forward, (S. N. Roy, “Ona 
heuristic method of test construction and its use in multivariate analysis’, Ann. Math. 
Stat., June, 1953) having the critical region: 6, = c, where 6, is the largest (necessarily posi- 
tive) characteristic root of the matrix S* S-! and S* is the sample ‘‘between’’ covariance 
matrix, everywhere at least p.s.d. of rank g = min(p, k — 1) and S is the sample ‘‘within”’ 
covariance matrix, everywhere p.d., and where c is given by: P(@, = c | Hy) = a@ (say). 
If we denote by H, the usual nonnull hypothesis and by =*, the usual weighted “‘between”’ 
covariance matrix of the k populations, it is well known and also has been shown, (see above 
reference), that the power of the critical region, that is, P(@, 2 ¢ | H) is a function of just 
the characteristic roots (all nonnegative) of the matrix =* =~. It is shown in the present 
paper that, for a given c, that is, a, this power is a monotonically increasing function of 
each of the population characteristic roots, which incidentally proves that the proposed 
test is unbiased. 


27. Some Large-Sample Results on Estimation and Power for a Method of 
Paired Comparisons. (Preliminary Report.) Raven A. Brapwey, Virginia 
Polytechnic Institute. 


Certain large-sample results are obtained for a method of paired comparisons developed 
by Terry and the author (Biometrika Vol. 39 (1952), p. 324). In that paper a parameter 
II; is postulated for each of ¢ items or treatments with 2; 0; = 1 and each I; 2 O. It was 
further postulated that in a comparison of item i with item j the probability that item 7 
obtain rank 1 be ;/(M; + 11;). A null hypothesis, I; = //t for all i, was tested against a class 
of alternatives using maximum likelihood estimates p; of II; and likelihood ratio tests. 
In the present paper formulas are developed for the variances of the estimators for large 
samples and confidence limits placed on II; , (II; — 11;), and (log 1; — log 1,). It is further 
shown that, if 6; = Wn (1; —1/t) and if R is the likelihood ratio statistic for homogeneity 
of treatment ratings, then —2log FR has for large samples the distribution of a noncentral 
chi square with (t — 1) degrees of freedom and parameter \ = f° 3; 6;/4. The test is shown 
to be asymptotically more powerful than a multi-binomial test formulated and to have a 
relative efficiency, when compared with the analysis of variance, of t/{(¢ — 1)II}. Illustra- 
tive examples in taste testing are given. (Research sponsored by the Bureau of Agricultural 
Economics, United States Department of Agriculture.) 


28. Nonparametric Estimation of Survivorship. Pau Merer, Johns Hopkins 
University. ; 


A standard problem of life testing in general and medical follow-up in particular is the 
estimation of the proportion of individuals surviving to time 7’, for example, the proportion 
of newly diagnosed cancer cases who survive 5 years. For the case of follow-up with unbiased 
losses it is shown that the simple limiting form of the usual ‘“‘actuarial’’ estimate (taking 
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the limit as the interval size goes to zero) is unbiased with variance well approximated by 
a formula proposed by Greenwood. The estimate is also derived as the maximum likelihood 
solution of the nonparametric estimation problem and the procedure is extended to the 
case of competing risks. Various methods of estimating survivorship are compared, with 
special reference to their sensitivity to biases in the data. 


29. Comparison of Two Rank Order Tests for the Two-Sample Problem. 
GottrrriED E. Nortuer, Boston University. 


Recently, two rank order tests have been suggested for testing the hypothesis Ho that 
two samples of sizes m and n come from the same continuous population, the alternative 
being that the two samples come from normal populations with different means, but common 
variance. Let r stand for the ranks of the n observations in the second sample in the over-all 
ranking of all m + n = N observations. Then Terry’s test (Ann. Math. Stat., Vol. 23 (1952), 
pp. 346-366) is based on the statistic c,(R) = 2, E(Zy,r) where Zy, is the rth order statistic 
in a sample of size NV from a standard normal population. Van der Waerden’s test (Nederl. 
Akad. Wetensch. Proc. Ser. A, Vol. 55 (1952), pp. 453-458) is based on the statistic X = 
=, ¥(r/N + 1) where y¥(p) is the p-quantile of the standard normal distribution. On the 
basis of examples, it is easily shown that the two tests do not always lead to the same de- 
cision. However, when H, is true, the correlation coefficient between c,(R) and X tends 
to 1 as N increases, and the two tests are asymptotically equivalent. 


30. The Poisson Distribution as a Limit of Dependent Binomial Distributions 
with Unequal Probabilities. Joun E. Wausn, U. 8. Naval Ordnance 
Test Station, Inyokern. 


It is well known that the Poisson probability distributicn approximates the binomial 
probability distribution for situations where the sample size n is large and the probability 
of ‘“‘success’”’ small. This result was extended to the case of n independent binomial events 
with possibly different probabilities for ‘“‘suecess’”? by B. O. Koopman (‘Necessary and 
sufficient conditions for Poisson’s distribution,’? Proc. Amer. Math. Soc., Vol. 1 (1950), 
pp. 813-823). This paper presents a further extension which appears to be of practical 
interest and where an event is not required to be statistically independent of all the n — 1 
other events. Roughly stated, the limiting conditions assumed are: First, each event is 
statistically independent of at least n — m — 1 of the other events and m/n ~Oasn— x. 
Second, although the conditional probability of ‘‘success’’ for an event can be greatly 
changed in ratio by knowledge of the outcomes for other events, m times this probability 
tends to zero as n — ~«. Third, the sum over all events of the unconditional probabilities 
of ‘“‘suecess’’ converges to a finite value as nm ~ «. An approximate form of these conditions 
for large but finite m is presented along with an outline of a general method of intuitive 
verification for practical applications. 


31. An Estimate of the Number of States in a Discrete Markov Chain. A. T. 
Rerp, University of Chicago. 


In this note we point out a way of obtaining an estimate of the number of states in a 
discrete Markov chain with two absorbing states. Let us call the states Ey , FE, ,--- , Ea; 
and define transition probabilities 7;.;.1 = 1/a, Ti; = 1 —.i/a (i = 1, --- ,a — 1), and 
ra = 1(i = 0, a). The above chain (with Eo and £, representing recovery and death of an 
irradiated organism) has been used as a model in radiobiology (Bull. Math. Biophysics, 
Vol. 13 (1951), pp. 153-163). It is of interest to obtain an estimate of a from the experi- 
mentally observed times required for the system to enter either Ey or E, . Suppose we ob- 
serve the system on n occasions, m of which it ends up in Zo . Call tj(j = 1, --+ , m) the 
time required for the system to enter £5 if initially it was in F, . Assuming @ steps or transi- 
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tion, is an approximation to the distribution generated by standard randomization pro- 
cedures, (ii) The spherical distribution is generated exactly by the process of angular 
randomization, which can be used with certain multi-factor designs. (G. E. P. Box, ‘‘Multi- 
factor designs of first order,’’ Biometrika, Vol. 39 (1952), pp. 49-57.) (iii) Using a parent 
spherical distribution certain distribution problems may be attacked from a novel and use- 
ful angle. When the null-hypothesis is not true, the power of the test criterion is not inde- 
pendent of the function f chosen. However tests which are U.M.P. on the usual assumptions 
are also U.M.P. for any spherical distribution in which f is a decreasing function. 


26. On the Monotonic Character of the Power of a Certain Test in Muitivariate 
Analysis of Variance. S. N. Roy, University of North Carolina. 


A test of the hypothesis H) of equality of means for k p-variate normal populations 
(assumed to have the same dispersion matrix =) has been put forward, (8S. N. Roy, “Ona 
heuristic method of test construction and its use in multivariate analysis’, Ann. Math. 
Stat., June, 1953) having the critical region: 6, = c, where 6, is the largest (necessarily posi- 
tive) characteristic root of the matrix S* S~! and S* is the sample ‘‘between”’ covariance 
matrix, everywhere at least p.s.d. of rank g = min(p, k — 1) and S is the sample ‘‘within”’ 
covariance matrix, everywhere p.d., and where c is given by: P(@, 2 c | Hy) = @ (say). 
If we denote by H, the usual nonnull hypothesis and by =*, the usual weighted ‘‘between’”’ 
covariance matrix of the k populations, it is well known and also has been shown, (see above 
reference), that the power of the critical region, that is, P(@, 2 c | H) is a function of just 
the characteristic roots (all nonnegative) of the matrix =* =~. It is shown in the present 
paper that, for a given c, that is, a, this power is a monotonically increasing function of 
each of the population characteristic roots, which incidentally proves that the proposed 
test is unbiased. 


27. Some Large-Sample Results on Estimation and Power for a Method of 
Paired Comparisons. (Preliminary Report.) Ratpu A. Brapwey, Virginia 


Polytechnic Institute. 


Certain large-sample results are obtained for a method of paired comparisons developed 
by Terry and the author (Biometrika Vol. 39 (1952), p. 324). In that paper a parameter 
II; is postulated for each of ¢ items or treatments with =; 0; = 1 and each TI; 2 0. It was 
further postulated that in a comparison of item i with item j the probability that item i 
obtain rank 1 be I1;/(M1; + 11;). A null hypothesis, I; = //t for ail i, was tested against a class 
of alternatives using maximum likelihood estimates p; of I; and likelihood ratio tests. 
In the present paper formulas are developed for the variances of the estimators for large 
samples and confidence limits placed on I; , (11; — Ij), and (log 1; — log I,). It is further 
shown that, if 6; = Wn (I; — 1/t) and if R is the likelihood ratio statistic for homogeneity 
of treatment ratings, then —2log FR has for large samples the distribution of a noncentral 
chi square with (t — 1) degrees of freedom and parameter \ = ¢? 3; 6;/4. The test is shown 
to be asymptotically more powerful than a multi-binomial test formulated and to have a 
relative efficiency, when compared with the analysis of variance, of t/{(¢ — 1)I1}. Illustra- 
tive examples in taste testing are given. (Research sponsored by the Bureau of Agricultural 
Economics, United States Department of Agriculture.) 


28. Nonparametric Estimation of Survivorship. Pau. Meter, Johns Hopkins 
University. 


A standard problem of life testing in general and medical follow-up in particular is the 
estimation of the proportion of individuals surviving to time 7’, for example, the proportion 
of newly diagnosed cancer cases who survive 5 years. For the case of follow-up with unbiased 
losses it is shown that the simple limiting form of the usual ‘‘actuarial” estimate (taking 
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the limit as the interval size goes to zero) is unbiased with variance well approximated by 
a formula proposed by Greenwood. The estimate is also derived as the maximum likelihood 
solution of the nonparametric estimation problem and the procedure is extended to the 
case of competing risks. Various methods of estimating survivorship are compared, with 
special reference to their sensitivity to biases in the data. 


29. Comparison of Two Rank Order Tests for the Two-Sample Problem. 
GottrrieD E. Noretuer, Boston University. 


Recently, two rank order tests have been suggested for testing the hypothesis H» that 
two samples of sizes m and n come from the same continuous population, the alternative 
being that the two samples come from normal populations with different means, but cominon 
variance. Let r stand for the ranks of the n observations in the second sample in the over-all 
ranking of all m + n = N observations. Then Terry’s test (Ann. Math. Stat., Vol. 23 (1952), 
pp. 346-366) is based on the statistic c,(R) = 2, E(Zy,) where Zy; is the rth order statistic 
in a sample of size N from a standard normal population. Van der Waerden’s test (Nederl. 
Akad. Wetensch. Proc. Ser. A, Vol. 55 (1952), pp. 453-458) is based on the statistic X = 
>, ¥(r/N + 1) where y(p) is the p-quantile of the standard normal distribution. On the 
basis of examples, it is easily shown that the two tests do not always lead to the same de- 
cision. However, when Hy, is true, the correlation coefficient between c,(R) and X tends 
to 1 as N increases, and the two tests are asymptotically equivalent. 


30. The Poisson Distribution as a Limit of Dependent Binomial Distributions 
with Unequal Probabilities. Joun E. Wausn, U. 8. Naval Ordnance 
Test Station, Inyokern. 


It is well known that the Poisson probability distribution approximates the binomial 
probability distribution for situations where the sample size n is large and the probability 
of ‘‘success”’ small. This result was extended to the case of n independent binomial events 
with possibly different probabilities for ‘‘success’’ by B. O. Koopman (‘‘Necessary and 
sufficient conditions for Poisson’s distribution,’? Proc. Amer. Math. Soc., Vol. 1 (1950), 
pp. 813-823). This paper presents a further extension which appears to be of practical 
interest and where an event is not required to be statistically independent of all the n — 1 
other events. Roughly stated, the limiting conditions assumed are: First, each event is 
statistically independent of at least n — m — 1 of the other events and m/n ~Oasn— =. 
Second, although the conditional probability of ‘“suecess’’ for an event can be greatly 
changed in ratio by knowledge of the outcomes for other events, m times this probability 
tends to zero asn — x. Third, the sum over all events of the unconditional probabilities 
of ‘‘suecess’’ converges to a finite value as n — «. An approximate form of these conditions 
for large but finite m is presented along with an outline of a general method of intuitive 
verification for practical applications. 


31. An Estimate of the Number of States in a Discrete Markov Chain. A. T. 
Rep, University of Chicago. 


In this note we point out a way of obtaining an estimate of the number of states in a 
discrete Markov chain with two absorbing states. Let us call the states Ey) ,£,,--- , Ba; 
and define transition probabilities r;.;.. = t/a, ri.i1 = 1 —.i/a (i = 1, --- ,a — 1), and 
ra = 1(i = 0, a). The above chain (with 2) and EF, representing recovery and death of an 
irradiated organism) has been used as a model in radiobiology (Bull. Math. Biophysics, 
Vol. 13 (1951), pp. 153-163). It is of interest to obtain an estimate of a from the experi- 
mentally observed times required for the system to enter either Ey or E, . Suppose we ob- 
serve the system on n occasions, m of which it ends up in 2 . Call tjo(j = 1, --+ , m) the 
time required for the system to enter E) if initially it was in F, . Assuming @ steps or transi- 
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tions per unit time let kjo(= at jo) represent the number of steps required for the system 
to pass from E; to Ey . Now ris’ is the probability of going from EF; to Ep in kj steps. The 
likelihood function is P(kio, «++, Kmo; a) = Wa rig’. Put ¢ = Dik», then L = 
In P = Inrjo, where rio is the element in the first column and second row of the matrix R 
raised to the £-th power. Since R is decomposable we have rio = ri: T10 . The probability 
riz can be calculated using algebraic methods. Differentiating L with respect to a, and 
proceeding in the usual manner we can obtain a maximum likelihood estimate of the number 
of states. 


32. On a Test of the Rank of a Matrix of Means for k p-variate Normal Popu- 
lations. S. N. Roy, University of North Carolina. 


Suppose we have random samples of sizes n, (r = 1, 2, --- , k) from & p-variate normal 
populations with a common dispersion matrix =. Let =* be the weighted raw “‘between”’ 
covariance matrix of the k populations, that is, the covariance matrix of means (without 
reducing to the grand means) and S* be the raw ‘“‘between”’ 
matrices of the k samples. Almost everywhere, S* is at least p.s.d. of rank g = min(p, k) 
and S is p.d. Also & is p.d. and &* is at least p.s.d. of rank, say, r S min(p, k). To test 
the hypothesis that =* is of a specific rank r, that is, the p X k matrix of means is of rank 
r, the technique of an earlier paper, (S. N. Roy, ‘““On a heuristic method of test construction 
and its use in multivariate analysis’, Ann. Math. Stat., June, 1953.), is used, leading to a 
critical region in terms of the characteristic roots of the matrix S*S~!, which are all non- 
negative and of which q roots are, almost everywhere, positive. Some properties of the test 


and S the “‘within’”’ covariance 


are also discussed. 


33. On the Monotonic Character of the Power of a Test of Independence in 
Multivariate Analysis. S. N. Roy, University of North Carolina. 


A test has been offered (S. N. Roy, “On a heuristic method of test construction and its 
use in multivariate analysis”, Ann. Math. Stat., June, 1953.) for the hypothesis Ho of inde- 
pendence of two sets of variates, p and qg in number (with a joint (p + q)-variate normal 
distribution), the critical region of the test being: 6, = c, where p S q, and @, is the largest 
characteristic root of the sample matrix Bey Sse Bee a , and S;; and Sy» are the sample 
covariance matrices of the p-set and the qg-set and S,» is the sample covariance matrix 
between the p-set and the q-set, and where P(@, 2 c | Ho) = @ (say). Almost everywhere, 
the (p + g)th order sample covariance matrix is p.d. and S;.is of rank p and thus all the 
p characteristic roots are positive. If we denote by 2), , 222. and ZY» the corresponding 


population roots, then assuming that the (p + g)th population covariance matrix is p.d., 


all the p characteristic roots of the population matrix 2} 24. Y2 Yi are nonnegative. It 
is well known and has also been shown (see above reference) that the power of the test, 
that is, P(@, 2 c | H), is a function of just the characteristic roots of the population matrix 


sale »—1 »/ : . . . . e ° 
Tn Liz Zee Lie. It isshown in the present paper that it is a monotonically increasing func- 


tion of each of these characteristic roots, which incidentally proves that the test is an 
unbiased one. 


34. The Asymptotic Variance of Estimates of the Mean Life of a Radioactive 
Source (Preliminary Report.) Ricnarp F. Linx, Princeton University. 


Suppose the number of particles which disintegrate in A nonoverlapping time intervals 
of equal length t is recorded. Suppose further that these particles come from a source with 
mean life 7 or from a background of constant intensity a. The intensity of the background 
may or may not be known. Let 7 be estimated by the method of maximum likelihood. The 
asymptotic variance of this estimate is calculated for several values of: K, the number of 
time intervals; Kt/r, the number of mean lives the source is observed; and a, the intensity 
of the background. 
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35. Testing the Equality of Means of Rectangular Populations. Roser V. 
Hoae, State University of Iowa. 

Let 2,2, ++: ,%, andy, Ye, °** , Ym be ordered samples from two rectangular popu- 
lations having equal ranges but possibly different means. Using the likelihood ratio criterion 
we find that the statistic ¢ = max(z, — 21, Ym — yi)/[max(zp , Ym) — Min(X , y.)] is used 
to test the hypothesis that the two means are equal. To find the distribution of this ratio 
under the null hypothesis we proceed as follows. First, show that the ratio and its denomi- 
nator are stochastically independent using an extension of a theorem of Neyman concerning 
the independence of sufficient statistics and statistics whose distributions do not involve 
the parameters. Second, calculate the moments of the numerator and of the denominator 
and thus, by dividing, those of the ratio. Third, observe from these moments the interesting 
fact that the distribution function of the ratio is a combination of the discrete and con- 
tinuous types; namely 0, f < 0; 2nmt"*™-?/(n + m) (n +m—1),0 St <1;1,1 8 t. This 
problem is extended to more than two populations. 


36. Structure of the Sample Space for Group Organization Theory. Leo Karz 
AND JAMES H. Powe Lt, Michigan State College. 


An organization of n individuals is connected by (directional bonds between pairs. Any 
particular configuration of the bonds produces a directed graph. In the social psychological 
applications, certain functions of graphs are employed. These are random variables over 
the graphs, considered as points in an appropriate sample space. The context of the particu- 
lar psychological investigation may induce a conditioning of the sample space. In the null 
case, each directed graph of { joins on n nodes is equally likely. These are partitioned dis- 
jointly and exhaustively, first, by point-wise restrictions on the outgoing lines and, second, 
by further point-wise restrictions on incoming lines. An unpublished result of Katz and 
Powell on graph theory gives the number of points in a second-order subspace. The second - 
order subspaces in a first-order space are obtained by standard combinatorial methods. In 
special cases, many second-order spaces are isomorphic in sets; consequently, calculations 
may be materially abridged. Applications are given to classes of unsolved problems of 
group organization theory. (Work sponsored by Office of Naval Research). 


37. A Family of Cumulative Frequency Functions for J-shaped Frequency 
Functions. C. W. Torr anp F. C. Leone, Case Institute of Technology. 


A three-parameter family of cumulative frequency functions is presented. These cover a 
wide variety of J-shaped frequency functions. Upon testing a number of J-shaped empirical 
distributions it was found that many of these had third and fourth moments within the 
range covered by this family of curves. These empirical distributions are composed pri- 
marily of life testing and failure data. A graph of a3 and 6 has been prepared. This includes 
some members of the family of frequency functions for different values of the parameters 
After a proper choice of a specific function from the graph, the cumulative frequency fune- 
tion can be used directly for the chi square test of goodness of fit. 


38. Multilayer Significance Procedures. (Preliminary Report.) Joun W. Tukey, 
Princeton University. 


Even when satisfactory confidence procedures are available for multiple comparisons, 
there is some real need, and more supposed need, for significance procedures. The framework 
of multilayer significance procedures includes the procedures proposed by Fisher, Newman, 
Dunean, Tukey (2 versions, one with minor modification), Keuls, Nandi, and Cornfield 
and co-workers (at least in part) among others, (It is easy to imagine procedures which do 
not fit into its framework.) and is patterned after Duncan’s recent discussion. Conceptually, 
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at least, it involves testing every subgroup of the k determinations for apparent significance. 
A subgroup which, together with all subgroups which include it, is apparently significant, 
is adjudged significant. Various kinds of levels of significance are defined. As John Mandel 
pointed out in connection with the author’s first multiple comparison procedure, the over- 
all null hypotheses most likely to cause error are those in which the determinands (‘‘true”’ 
values of the determinations) are equal in pairs, these common values being widely sepa- 
rated. If WSD; is the allowance for j determinations based on the studentized range, then 
4(WSD; + WSD,) may be used to test the range of subgroups of h froma g up of k with- 
out exceeding the nominal error rate. A gap procedure may be added. 


39. Estimation in Truncated Multivariate Normal Distributions. A. ©. Conen, 
Jr., University of Georgia. 


This paper represents an extension of results which the author presented before a joint 
meeting of I.M.S. and the Biometric Society in Washington, D. C., April 30, 1953, concerning 
bivariate normal distributions. Maximum likelihood estimators are derived for parameters 
of a multivariate normal population which are functions of a random sample in which one of 
the variates has been subjected to a truncation at known terminals. Single and double 
truncations both for known and unknown numbers of eliminated observations are con- 
sidered. The estimators are reduced to simple algebraic forms for easy application to practi- 
cal problems. Asymptotic variances and covariances of the estimates are obtained from the 
likelihood information matrices. 


40. The Extrema of Certain Functionals of Distribution Functions. (Preliminary 
Report.) Wasstty HoerrpinG, University of North Carolina. 


Two types of problems are considered. Problem I. Let ¢(F) = [ ee / mle, -** 4 Ba) 


dF (x,) --- dF(z,), where F is a cumulative distribution function (cdf) on the real line and 


K a given function. Let D be the class of all edf’s F(x) with [= dF (xr) =c; ,t=1,°--,7, 


where ¢;, -:- , ¢, are given numbers. To determine suprep ¢(F). Problem II. Let 


F = (F,, +--+: , F,) be a vector of n cdf’s on the real line, ¢(F) = [ ee / Kile , «** , Ba) 


dF,(x;) --- dF,(x,), and let V be the class of vectors F with / z‘ dF,(z) = c;,% = 1, --- 


r;j = 1,-:- ,n, where the c,; are given. To determine suprev¢(f). For Problem II it is 
found that if ¢(F) is continuous with respect to the metric d(F', G@) = max; sup, | F;(x) — 
G(x) |, then suprev¢(F) = suprev,+,¢(F), where V,4: is the subclass of V where the com- 
ponents of F are step functions with at most r + 1 steps. For Problem I a similar reduction 
to discrete distributions is possible only under more restrictive assumptions. Preliminary 
results for certain cases where K takes on the values 0 or 1 have been obtained. The results 
permit a sharpening of inequalities of the Tchebycheff type when the chance variable in- 
volved is assumed to be a sum of independent chance variables. 


41. Probability Distributions Related to Random Transformations of a Finite 
Set. (Preliminary Report.) Herman Rupin anp RosepitH SITGREAVES, 
Stanford University. 


Let X be a set of n elements, and let 3 be a set of transformations of X into X. For a 
given ze X and T ¢ 5, the smallest set of elements y ¢ X closed under T and including z, 
is called the structure in T containing z. 
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Let m be the number of structures in T, c be the number of elements in the structure 
containing z, s be the number of successors of x (including xz), and p the number of prede- 
cessors of x (including x). We assume that elements are selected at random from X X 3, 
each pair (x, J) having probability 1/nt of being chosen where t is the number of transforma- 
tions in 5. For each of these functions, exact probability distributions, together with asymp- 
totic expressions for these probabilities as n becomes large, have been found when 3 is the 
set of all transformations of X into X, and when J is restricted to transformations for which 
an element z ¢ X has either zero or k iminediate predecessors. Asymptotic expressions for 
several of these probability distributions have been obtained when 3 is the set of trans- 
formations for which an element z has no more than k immediate predecessors. 


42. Characterization of Tolerance Regions. D. A. S. Fraser, University of 
Toronto. 


A distribution-free or nonparametric tolerance region for a class of distributions 
(P;/0 €Q} over L(Y) is defined asa mapping S(m , --- , z,) from X* into A for which the 
distribution of PL(S(X1 ,*** , %n)) induced by the distribution P! for each x; is independ- 
ent of 6¢2. If gy(m , --+ , Z,) is the characteristic function of the set S(xm , --- , 2,) thea 
a necessary and sufficient condition that S(z , --- , z,) be distribution-free is that there 
be a sequence a , a2, --- such that gins, (%1, °** 5 Fn) — G1, Grmoy (L115 °°" » Ln) Geng 
(t1,°*** ,%n) — a2, +++ are unbiased estimates of zero over X"7!, M"t2, --- . 


43. A Nonparametric Model for the Linear Hypothesis. D. A. 8. Fraser, 
University of Toronto. 


If the errors of a linear hypothesis design are assumed to have a spherically symmetric 
joint distribution, then an orthogonal rotation putting the problem in canonical form per- 
mits the use of standard methods: rank tests, most powerful tests for specific alternatives, 
or tests based on substitution of order statistics from an independent normal sample. 


44. On the Analysis of Diurnal Fluctuations in Physiological States and Per- 
formance. (Preliminary Report.) E. Curistine Karis, Illinois Institute 
of Technology and University of Chicago. 


In studies dealing with the quantitative relationship between physiological states and 
work performed on various tasks over a perod of time, the problem of locating optimal pe- 
riods for testing arises. Previous studies have utilized either the method of constant interval 
or random time sampling. We know, however, that certain physiological states exhibit 
periodic variations adapted to some cycle; for example, body-temperature fluctuations 
coincide with the day-and-night as well as with the menstrual cycle. Now if variations in 
work-out put and performance on tests can be related to these known metabolic cycles, pre- 
diction of low and high points on a person’s curve should be possible in terms of expected 
changes which can be calculated from the beginning of an iterative series to later portions 
of it. In this vein a time series analysis of diurnal variations in body-temperature, heart- 
rate and a work-output test based on five measures per day was undertaken upon data 
gathered on a female subject over a period of three months. For this a punch-card technique 
was used. Significant auto- and cross correlations are indicated. In addition a two-way 
analysis of variance showed the following: the within day variation is five times greater 
than the day to day variation in all three variables, although both are significant at the 
.01 level. The greater part of the variation is contributed by the consistently lower measures 
obtained in the early morning and late night hours. Since the peak periods differ over the 
middle of the day in the three variables, no one best testing time for all can be established. 


It is however apparent that time-portions of established cycles can be used as indices of 
known conditions of variation 
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NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


Dr. Helen P. Beard will be on leave of absence from Newcomb College during 
the year 1953-1954 and will be at the Statistical Laboratory, University of 
California. 

Dr. Robert E. Bechhofer, formerly Assistant Professor of Industrial Engineer- 
ing and Director of the Statistical Consulting Service of the Department of 
Mathematical Statistics, Columbia University, has been appointed Associate 
Professor in the Department of Industrial and Administrative Engineering, 
Sibley School of Mechanical Engineering, Cornell University, Ithaca, New 
York. 

Dr. Julius R. Blum has been appointed to an instructorship in the Depart- 
ment of Mathematics, Indiana University, Bloomington, Indiana. 

Dr. Lyle D. Calvin, formerly Biometrician with G. D. Searle & Co., Chicago, 
has received his Ph.D. degree in experimental statistics from North Carolina 
State College, and has accepted the position of Experiment Station Statistician 
at Oregon State College, Corvallis. 

Benjamin Caplan has transferred from the Council of Economic Advisors to 
the Office of Defense Mobilization. 

Visiting Associate Professor Kk. L. Chung of Cornell University has been ap- 
pointed to an Associate Professorship at Syracuse University. 

Phelps P. Crump, formerly at North Carolina State College, has accepted 
the position of Statistician on the staff of the Biology Department, Brookhaven 
National Laboratory, Upton, L. I., New York. 

Masil B. Danford has completed his graduate program at the Institute of 
Statistics, North Carolina State College, Raleigh, and has taken a position as 
a Consulting Statistician with the Air Forces’ School of Aviation Medicine at 
Randolph Field, Texas. 

teed B. Dawson, Jr., after a year of temporary additional duty studying sta- 
tistics at Harvard, has returned to the National Security Agency, Washington, 
D.C. 

C. West Churchman has been appointed Professor of Engineering Adminis- 
tration and Director of Operations Research in the Engineering Administration 
Department at Case Institute of Technology, Cleveland, Ohio. 

Paul Gunther (ex-Paul Gutt) has resigned from the Institute for Air Weapons 
Research, University of Chicago, to accept a position as Statistician with the 
Engineering Measurement and Analysis Se vices Department of the General 
Engineering Laboratory, General Electric Company, Schenectady, New York. 

Daniel G. Horvitz, Assistant Professor, Department of Biostatistics, Uni- 
versity of Pittsburgh, received his Ph.D. degree in statistics in March 1953 
from Iowa State College. He joins the staff of the Department of Experimental 
Statistics, North Carolina State College as Associate Professor September 1. 
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Walter W. Hoy has left Ohio State University and has accepted a position 
with Chance Vought Aircraft as a Senior Project Engineer doing work in sta- 
tistics. 

David Huntsberger was promoted from Instructor to Assistant Professor in 
statistics at lowa State, effective July 1. 

Professor Stanley L. Isaacson is on leave of absence from his position as 
Assistant Professor of Statistics at Iowa State College from July, 1953 to July, 
1954. During this period he will be visiting Associate Professor and Research 
Associate in the Applied Mathematics and Statistics Laboratory at Stanford 
University. 

Dr. N. L. Johnson has returned to University College, London, after spending 
a year as Visiting Associate Professor at the Institute of Statistics, University 
of North Carolina. 

Robert M. Kozelka, formerly teaching at Tufts College, Medford while finish- 
ing his work for the doctorate at Harvard under the direction of Professor Fred- 
erick Mosteller, has accepted an Assistant Professorship of mathematics at the 
University of Nebraska, Lincoln. 

H. T. McAdams has accepted a position as Research Physicist at Cornell 
Aeronautical Laboratory, Buffalo. He was formerly employed as Research 
Chemist at Aluminum Research Laboratories, E. St. Louis, Illinois. 

Professor W. G. Madow of the University of Illinois has received a grant 
from the Fund for the Advancement of Education and will spend the year at 
Princeton University. 

Roger H. Moore has accepted a position of Research Assistant with the Los 
Alamos Scientific Laboratory at Los Alamos, New Mexico. 

A. Carl Nelson, Jr. has completed his studies toward a Ph.D. in mathematical 
statistics at the University of North Carolina and returned to teach statistics 
in the Department of Mathematics, University of Delaware. 

Sidney I. Neuwirth, formerly affiliated with the Research Division of Schering 
Corporation, Bloomfield, New Jersey, has accepted the position of Biometrician 
with the Committee on Research of the American Medical Association, Chicago. 

Dr. Emanuel Parzen has been appointed to the staff of the Hudson Labora- 
tories of Columbia University, New York. 

George W. Snedecor returned to the Statistical Laboratory, Iowa State 
College, in July after serving as consultant in experimental statistics for six 
months under a grant from the General Education Board—mainly at Alabama 
Polytechnic Institu'~ and the University of Flerida. His appointment as con- 
sultant was part of a cooperative program of _.istics among the southeastern 
states supported by the Institute of Statistics of the Consolidated University 
of North Carolina. 

E. Webb Stacy is an Operations Analyst at Headquarters Eastern Air De- 
fense Force, Stewart Air Force Base, Newburgh, New York. 

Professor Rothwell Stephens has returned to Knox College after spending 
a vear at Princeton University on a Ford Faculty Fellowship. 

Donovan J. Thompson, Assistant Professor of Statistics at Iowa State Col- 
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lege, has assumed the position of Assistant Professor, Department of Biosta- 
tistics, University of Pittsburgh. 

William R. Thompson, Senior Biochemist in the Division of Laboratories 
and Research, New York State Department of Health, was awarded the Alfred 
E. Smith prize for outstanding professional accomplishment during the previous 
year by the Albany Chapter of the Public Administration Society at their an- 
nual dinner in May. The award was made in recognition of Dr. Thompson’s 
contribution to the field of public health by his work in mathematical statistics 
and his application of statistical methods to biological problems. 

David L. Wallace is a Moore Instructor in Mathematics at the Massachusetts 
Institute of Technology. 

Louis H. Wegner, Jr., formerly Teaching Fellow at the University of Oregon, 
has accepted a position in the Aviation Division of The Rand Corporation. 

H. Weingarten has been since September, 1952 a Mathematical Statistician 
and Head of Statistical Methodology Section, Quality Control Division, Bureau 
of Ordnance, Navy Department. 

——$—aag 


Council Resolution Regarding Mina Rees 


Dr. Mina Rees has resigned from her position as Chief of the Division of 
Mathematical Sciences of the Office of Naval Research, effective September, 
1953, and assumes the position of the Dean of the Faculty at Hunter College, 
New York City. The Council of the Institute of Mathematical Statistics feels 
that her departure from the Office of Naval Research should not pass un- 


marked. 

During the last war it became apparent that the state of science and tech- 
nology of a country is a paramount factor in its survival. Hence, the end of 
World War II was followed by an epoch of peace-time Federal support of sci- 
ence, first through the activities of the Office of Naval Research, then through 
the activities of similar organizations in the two other armed services and, 
finally, through the birth and development of the National Science Foundation. 
Thus, the Office of Naval Research did the pioneer work in Federal support of 
science. 

Naturally, the armed services are most obviously interested in military re- 
search, hence in applied research. The Office of Naval Research has showed by 
example that it understood that, to be effective, applied research must be pre- 
ceded by fundamental research. In order to have appropriate scientific person- 
nel in case of war and in order to obtain without great delay the solutions of 
the various military-scientific problems arising from a war, it is necessary to 
train personnel in advance and to build a reservoir of new scientific results and 
new methods. 

Under Dr. Rees’ leadership the Division of Mathematical Sciences of the 
Office of Naval Research gave wholehearted support to basic research, in par- 
ticular to basic research in mathematical statistics and probability. The whole 
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action was conducted with remarkable foresight and wisdom. Basic research 
has always meant basic research, unhampered by possible demands that it be 
of immediate usefulness to the Navy. As a result, the long range interests of the 
Navy and of the whole Nation were effectively served. The fruits of this activity 
have already been many and important and will continue to appear for many 
years to come. 

The great demand for trained mathematical statisticians existed before and 
during the war, and still persists. As noted by the Committee of the National 
Research Council in the 1940’s, there were at the time only a very few centers 
of instruction capable of training Ph.D.’s in mathematical statistics. Now the 
number of such centers has increased substantially. The Office of Naval Re- 
search basic research projects in these centers employ a great number of young 
men who thus obtain the training necessary for research and teaching. 

The postwar development of mathematical statistics in the United States 
owes a great deal to the farsighted policy of the Office of Naval Research ably 
administered by Dr. Rees. Mathematical statistics owes Mina Rees a public 
‘“‘well done,’ and extends its best wishes to her successor at the Office of Naval 
Research. 

ee 


Cooperative Graduate Summer Sessions in Statistics 


North Carolina State College, the University of Florida, Virginia Polytechnic 
Institute and the Southern Regional Education Board will jointly sponsor co- 


operative graduate Summer Sessions in Statistics, the first session to be held at 
Virginia Polytechnic Institute, June 9 through July 17, 1954. The long range 
plan provides for sessions each summer at the cooperating institutions in rota- 
tion with the possible later addition of other institutions of the South. The ses- 
sions are designed to be of particular interest to research and professional work- 
ers in various fields, teachers of elementary statistics, professional statisticians 
and graduate students both in statistics and in other fields. For the benefit of 
students, the courses will carry graduate credit and will have a continuity of 
subject matter over successive summers. 

The first session will include the following courses: Multivariate Analysis by 
Professor Maurice Kendall, Quantitative Genetics by Dr. Ralph Comstock, 
Probability and Inference, Analysis of Variance, Statistical Methods, Engi- 
neering Statistics, Education Statistics, Rank Order Statistics and Theory of 
Sequential Methods. The latter courses will be given by the resident staff, R. A. 
Bradley, D. B. Duncan, M. C. K. Tweedie, P. M. Somerville and Boyd Harsh- 
barger. In addition, special seminars will be directed by outstanding statistical 
scholars and advanced courses in agriculture, science and engineering will be 
available. 

For more detailed information, inquiries should be addressed to Boyd Harsh- 
barger, Head, Department of Statistics, Virginia Polytechnic Institute, Blacks- 
burg, Virginia. 
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University of Chicago Post-Doctoral Fellowships 


Three $4,000 post-doctoral fellowships in Statistics are offered for 1954-55 by 
the University of Chicago. The purpose of these fellowships, which are open to 
holders of the doctor’s degree or its equivalent in research accomplishment, is 
to acquaint established research workers in the biological, physical, and social 
sciences with the role of modern statistical analysis in the planning of experi- 
ments and other investigative programs, and in the analysis of empirical data. 
The development of the field of Statistics has been so rapid that most current 
research falls far short of attainable standards, and these fellowships (which 
represent the fourth year of a five-year program supported by The Rockefeller 
Foundatic”) are intended to help reduce this lag by giving statistical training 
to scientists whose primary interests are in substantive fields rather than in 
Statistics itself. The closing date for applications is February 15, 1954; instruc- 
tions for applying may be obtained from the Committee on Statistics, Univer- 
sity of Chicago, Chicago 37. 





RR 
Fulbright Awards 


Announcement has been made of competition for Fulbright awards for uni- 






versity lecturing and advanced research of Americans in Europe and the Ne 
East, Japan and Pakistan. The period of the awards will ordinarily extend f 


October 1954 to June 1955. Application must be made to Conference B ut 
‘ Associated Research Councils, Committee on International Exchange of sons, 
2101 Constitution Avenue, Washington 25, D. C., no later than Oct  ‘r 15, 


1953. 
—_—_—S ae 
New Members 
The following persons have been el.cted to membership in the Institute 


Mav 28, 1953 to August 18, 1953 





3L0cK, HerBert S., A.M. (Univ. of Illinois), Statistician, Goodyear Aircraft Corporation, 
1210 Massillon Road, Akron 15, Ohio. 

Burke, Pavcu J., Ed.M. (Harvard Univ.), Member of Technical Staff, Bell Telephone 
Laboratories, 89-20 161 St., Jamaica 32, New York. 

Dantets, H. E., Ph.D. (Univ. of Edinburgh), Lecturer in Mathematics, University of 
Cambridge, Statistical Laboratory, St. Andrews Hill, Cambridge, England. 

Doprinpt, Gerarp T., B.S. (John Carroll Univ.), Graduate Fellow in Mathematics, St 
Louis University, 359 N. Whittier, St. Louis 8, Missour?. 

EISENBERG, Harvey, M.A. (Brooklyn College), Mathematician, Evans Signal Labora 
tories, Signal Corps Engineering Labs, Belmar, New Jersey, 15 Garfield Avenue, Avon, 
New Jersey. 

Gaver, Donaup P., Jr., S.M. (M.L.T.), Staff Member, Operations Evaluation Grou), 
Office of the Chief of Naval Operations, 4123 North Henderson, Arlington, Virginia 
GILBERT, Water M., Ph.D. (Princeton Univ.), Instructor, Mathematics Department, 

Washington State College, Pullman, Washington 
Gitmore, Joun W., Ph.D. (Oxford), Director, Market Research Department, Charles 
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Pfizer & Company, Inc., Brooklyn 6, New York, 440 East 56th Street, New York 22, 
New York. : 

GoLpsMITH, BerRNarD P., B.S. (Univ. of New Hampshire), Quality Control Engineer, 
Raytheon Mfg. Company, Newton, Massachusetts, 40 Abbott Road, Dedham, Massa- 
chusetts. 

Hananta, Mary, M.A. (A.U.B., Beirut, Lebanon), Graduate Student, Columbia Univer- 
sity, New York, International House, Berkeley, California. 

Hop’ s, Wititiston C., M.S. (Howard Univ.), Associate Professor and Head of the Depart- 

ent of Mathematics, Box 6, Morris Brown College, Atlanta, Georgia. 

Ho: .'oHN Pau, Ph.D. (George Washington Univ.), Associate Professor of Mathematics, 

S. Naval Academy, Annapolis, Maryland, 1943 Fairfax Road, Annapolis, Maryland. 
Hes v, Juttus F., B.S. (Univ. of Tennessee), Associate Statistician, Carbide and Carbon 
micals Co., K-25 Plant, Oak Ridge, Tennessee, 3926 Greenleaf Avenue, Knozville, 
essee. 
, Burrus Sami, B.S. (Univ. of Georgia), Graduate Student, Mathematics Depart- 
ment, University of Georgia, Athens, Georgia. 

Kris, EvizaABeTH CHRISTINE, M.S. (Illinois Inst. of Tech.), Instructor and Assistant in 
Research, (1) Department of Psychology, Illinois Institute of Technology; (2) Re- 
search Assistant in Measurement & Statistics, Psycho-Physiological Laboratory, 
Institute for Juvenile Research, Department of Public Welfare; and (3) University 
Fellow, University of Chicago, Committee on Human Development, 5701 Maryland 
Avenue, Chicago 37, Illinois. 

LARRIEU, JEAN, Diplome de l'Institut de Statistique (Paris), Ingenieur-Chercheur, Electri- 
cité de France, Service des Etudes et Recherches Hydrauliques, Paris, France, 4 Rue 
Nungesser, Fontenay-Sous-Bois, Seine, France. 

McMituan, Ropert G., B.A. (Emory Univ.), Junior Statistician, Y-12 Plant, Carbide and 
Carbon Chemicals Company, Oak Ridge, Tennessee. 

Minker, Jack, M.S. (Univ. of Wisconsin), Advanced Development Engineer, RCA Victor, 
Camden, New Jersey, 105-D Wallworth Park Apartments, Haddonf eld, New Jersey. 
Mouitz, Hettmutu, Doktor-Ingenieur (Technische Hochschule, Berlin), Forschungsinsti- 

tut Weil am Rhein, Hauptstrasse 193, Germany. 

Monro, Sutron, B.S. (Mass. Inst. Tech.), Fundamental Quality Engineer, Bell Tele 
phone Labs., Inc., 463 West Street, New York, New York, 103 Cypress Street, Maple 
wood, Ve w J¢ sey. 

Nicuo.t, Rosperr J., A.B. (Duke Univ.), Graduate Assistant, Department of Statistics 
North Carolina State College, Raleigh, North Carolina. 

NISHIME, FRANK 8., B.S. (Univ. of Itlinois), Graduate Student, University of Illinois, 
108 N. Romine, Urbana, Illinois 

{AMACHANDRAN, K. V., M.A. (Bombay Univ.), Graduate Student, Institute of Statistics, 
Chapel Hill, North Carolina, 309 Connor Dormitory, University of North Carolina, 
Chapel Hill, North Carolina 

tanpoLpH, Paut H., M.A. (Univ. of Minnesota), Associate Operations An: ,st, Opera 
tions Research, Armour Research Foundation of the Ulinois Institute of Technology, 
Chicago, 370 Dogwood, Park Forest, Illinois. 

Ross, WitLarp C. Jr., M.S. (State Univ. of Iowa), Part-time Instructor in Department 
of Mathematics and Astronomy, State University of Iowa, 141 Riverside Park, Iowa 
City, Towa. 

Sacks, Jerome, B.A. (Cornell Univ.), Graduate Student, Department of Mathematics 
Cornell University, Ithaca, New York 


Sanps, Danret E., M.S. (Pennsylvania State College), Graduate Assistant, Institute of 


Statistics, Department of Experimental Statistics, North Carolina State College, 
Raleigh, North Carolina, Box 5457, State Colle ae Station, Raleigh, North Carolina 


Sean, Krron Cuanpra, A.M. (Princeton Univ.), Research Assistant & Graduate Student, 


belo 
‘ 
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Department of Statistics, Phillips Hall, University of North Carolina, Chapel Hill, 
North Carolina. 

STRIEBEL, CHARLOTTE T., M.A. (Ohio State Univ.), Research Assistant, Department of 
Psychology, Ohio State University, 204 S. Cassingham Road, Columbus 9, Ohio. 

Scunprum, R. M., Ph.D. (Univ. of London), Research Associate, Institute of Statistics, 
University of North Carolina, Chapel Hill, North Carolina. 

Swanson, Margaret, B.S. (Madison College), Graduate Student and Research Assistant, 
Statistical Laboratory, University of California, 1820 Euclid Avenue, Apt. 8, Berkeley 9, 
California. 

Terpstra, T. J., M.S. (Univ. of Groningen), Chief Textile Research Laboratory, H. ten 
Cate Hzn & Co., Almelo, Holland, Brugstiaat 11, Almelo, Holland. 

VAN DER WAERDEN, B. L., Ph.D. (Univ. of Amsterdam), Professor, University of Zurich, 
Rainfussweg 7, Zurich, Switzerland. 


REPORT OF THE STANFORD MEETING OF THE 
INSTITUTE 


The fifty-sixth meeting of the Institute of Mathematical Statistics was held 
at Stanford University, June 19-20, 1953, in conjunction with the annual meet- 
ing of the Biometric Society, WNAR, and the fourth West Coast Regional 
Meeting of the Econometric Society. Ninety-three persons registered, including 
the following members of the institute: 


Om P. Aggarwal, S. G. Allen, Jr., F. C. Andrews, L. A. Aroian, R. B. Ashley, G. A. 
Baker, R. O. Been, Z. W. Birnbaum, J. R. Blum, C. H. Boll, A. H. Bowker, R. N. Bradt, 
Bernice Brown, D. G. Chapman, Herman Chernoff, C. L. Chiang, Randolph Church, E. L. 
Crow, Besse Day, W. J. Dixon, Robert Dorfman, Mary Elveback, Evelyn Fix, M. A. 
Girshick, W. C. Guenther, J. L. Hodges, Jr., P. G. Hoel, W. C. Hoffman, J. F. Hofman, 
J. M. Howell, T. A. Jeeves, Leo Katz, H. 8. Konijn, L. M. Le Cam, E. L. Lehmann, G. J. 
Lieberman, R. K. Maggy, C. A. Magwire, F. J. Massey, Jr., P. L. Meyer, A. M. Mood, 
Lincoln Moses, M. E. Muller, Peter Newman, Jerzy Neyian, J. H. Powell, Joseph Putter, 
G. J. Resnikoff, R. G. Richards, Herman Rubin, E. L. Scott, F. F. Sheehan, Rosedith Sit- 
greaves, R. F. Tate, Dan Teichroew, Leo Tornqvist, Donald Truax, Elizabeth Vaughan, 
J. E. Walsh, Irving Weiss, Oscar Wesler. 


Professor G. A. Baker, University of California, Davis, was chairman of the 
opening session on Friday morning, a joint session with the Biometric Society. 
The speakers and their subjects were: 


1. Special invited address. Estimation of Biological Populations. D. G. Chapman, Uni- 
versity of Washington. 

2. Stochastic Models Related to Experimental Studies of Inter-Species and Intra-Species 
Competition. L. LeCam and J. Neyman, University of California, Berkeley. 

3. Sample Survey Techniques in Morbidity Measurement. Arthur Weissman, California 
State Department of Health. 

4. The Diffusion of Drugs into and through Tissues. D. J. Jenden, University of Cali- 


fornia Medical School, San Francisco. 
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On Friday afternoon Professor M. A. Girshick, Stanford University, presided 


ata 


1 


session at which the following contributed papers were presented: 


. On the Probability Function of the Quotient of Sample Ranges from a Rectangular 
Distribution. Leo A. Aroian, Hughes Aircraft and Development Laboratories, Cul 
ver City. 


2. Actuarial Validity of the Binomial Distribution for Large Numbers of Lives with 


Small Mortality Probabilities. John E. Walsh, U. 8S. Naval Ordnance Test Station, 
China Lake. ; 


3. On the Distribution of the Likelihood Ratio. Herman Chernoff, Stanford University. 
4. Testing the Approximate Validity of Statistical Hypotheses. J. L.. Hodges, J 


Krich L. Lehmann, University of California, Berkeley. 


5. Distribution of Correlated Means. D. S. Villars, U. S. Naval Ordnance Te: 


China Lake. (Introduced by John E. Walsh.) 


}. On the Detection of Sure Signals in Noise. (By title.) R. C. Davis, U. S. Naval Ord 


nance Test Station, Pasadena Annex. 

A Statistic Associated with the Joint Distribution of n Successive Amplitudes. Pre 
liminary Report. William C. Hoffman, U. S. Navy Electronics Laboratory, San 
Diego. 


s. Some Two-Sample Tests Based on a Particular Measure of Discrepancy. (By title.) 


Louis H. Wegner, University of Oregon. 

. Confidence Intervals for a Proportion. Preliminary Report. E. L. Crow, U. 8. Naval 
Ordnance Test Station, China Lake. 

. On Estimating Both Mean and Standard Deviation of a Normal Population fronc the 
Lowest r out of n Observations. John V. Breakwell, North Americar’ Aviation Com- 
pany, Los Angeles. (Introduced by A. M. Mood.) 

. Strong Consistency of Stochastic Approximation Methods. (By title.) Julius R. Blum, 
University of California, Berkeley. 


2. Some Probability Results for Mortality Rates Based on Insurance Data. (By title.) 


John E. Walsh, U. S. Naval Ordnance Test Station, China Lake. 


3. Extensions of the U-Test to Three Populations. (By title.) Louis H. Wegner, Uni 


versity of Oregon. 

. Normal Regression Theory and Some Classical Statistics in Multivariate Analysis. 
(By title.) Junjiro Ogawa, Osaka University. (Introduced by H. Hotelling.) 
The Use of Maximum Likelihood Estimates in Chi Square Tests of Goodness of Fit. 
(By title.) Herman Chernoff, Stanford University, and Erich L. Lehmann, Uni- 
versity of California, Berkeley. 


}. On the Treatment of Ties in Nonparametric Tests. (By title.) Joseph Putter, Uni- 


versity of California, Berkeley. 


= Asymptotic Re lative Efficie ncy of Some Rank Tests for Analysis of Variance Prob- 


lems. (By title.) F. C. Andrews, Stanford University. 
. Application of the Studentized Maximum Chi-Distribution. Preliminary Report. T 
A. Jeeves, University of California, Berkeley. 


Saturday morning, Professor Robert Dorfman, University of California, 
Berkeley, was chairman of a joint session with the Econometric Society. The 
following papers were presented: 


h 


») 


Estimating Individual Behavior Patterns from Aggregate Data. Peter Newman, Nuffield 
College, Oxford, and Stanford University’. 

Applications of Multivariate Analysis. Richard O. Been, University of California, 
Berkeley 
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Saturday afternoon, Professor Paul CG. Hoel, University of California, Los 
Angeles, presided at the session on the Power of Nonparametric Tests. The 
speakers and their subjects were: 


1. Local Large Sample Power of Some Two-Sample Tests Against Normal Alternatives. 
A. M. Mood, Rand Corporation. 


2. Asymptotic Efficiency of Nonparametric Tests. Erich L. Lehmann, University of Cali- 
fornia, Berkeley. 

3. On the Treatment of Ties in Nonparametric Tests. Joseph Putter, University of Cali 
fornia, Berkeley 

4. Asymptotic Relative Efficiency of Some Rank Tests for Analysis of Variance Problems. 
F. C. Andrews, Stanford University. 

5. Power Efficiency Against Normal Alternatives for Certain Two-Sample Tests. W. J. 
Dixen, University of Oregon. 


An informal beer party was held on Friday evening. 
RosepitH SITGREAVES 
Assistant Secretary 


or OO 


REPORT OF THE KINGSTON MEETING OF THE INSTITUTE 


The fifty-seventh meeting and fifteenth summer meeting of the Institute of 
Mathematical Statistics was held in Kingston, Ontario, on August 31-Septem- 
ber 4, 1953. The meeting was held in conjunction with meetings of the American 
Mathematical Society, the Mathematical Association of America, and the Econo- 
metric Society. An invited address was given by Professor H. O. Hartley on 
Experimental Sampling with Control Variables. On the afternoon of September 2, 
the members had a choice of a trip by motorboat among the Thousand Islands 
or a trip to the Sand Banks and swimming beach. On the evening of September 
2 there was a showing of Canadian films. On the evening of September 3 the 
members were guests of Queen’s University and the Canadian Mathematical 
Congress at a theatre party. 

Approximately 550 persons attended the meetings, including the following 98 
members of the Institute: 


Carl B. Allendoerfer, Sigurd L. Andersen, R. L. Anderson, Harvey J. Arnold, Herbert 
EE. Arnold, Kenneth J. Arnold, J. D. Bankier, Robert E. Bechhofer, Allan Birnbaum, David 
Blackwell, Julius R. Blum, Colin R. Blyth, R. C. Bose, Ralph A. Bradley, Margaret K. 
Butler, Randolph Church, A. Bruce Clarke, A. C. Cohen, Jr., Randal H. Cole, T. Freeman 
Cope, A. H. Copeland, Cecil C. Craig, George B. Dantzig, D. B. DeLury, Cyrus Derman, 
Tom Donnelly, George L. Edgett, Herbert P. Evans, Robert M. Exselsen, C. H. Fischer, 
J. Sutherland Frame, D. A. 8. Fraser, Ramon G. Gamoneda, Harry M. Gehman, B. C. 
Getchell, J. A. Greenwood, Irwin Guttman, John S. Hagan, J. F. Hannan, M. H. Hansen, 
Bertha I. Hart, H. Leon Harter, H. O. Hartley, Wassily Hoeffding, Robert Hogg, W. C. 
Hood, H. 8. Houthakker, Stanley Isaacson, J. E. Jackson, Walter Jacobs, T. J. Jaramillo, 
A. E. Karp, Leo Katz, E. 8. Keeping, Bradford F. Kimball, T. C. Koopmans, Carl F. Kos- 
sack, E. Christine Kris, Solomon Kullback, O. E. Lancaster, F. C. Leone, Julius Lieblein, 
H. T. McAdams, G. Ek. McCreary, William G. Madow, Kenneth O. May, John W. Mayne, 
Paul Meier, Elmer B. Mode, Jack Moshman, Shu-Teh Chen Moy, C. R. Newell, J. Neyman, 
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ki. G. Olds, Toby Oxtoby, Emanuel Parzen, W. E. Patte, James H. Powell, G. Baley Price, 
Herbert Robbins, 8. N. Roy, Herman Rubin, Evelyn L. Rumer, Rosedith Sitgreaves, 
Milton Sobel, Paul M. Somerville, Henry Teicher, R. M. Thrall, Marrian M. Torrey, A. 
W. Tucker, John W. Tukey, John R. B. Whittlesey, R. Wormleighton. 


The Program of the meeting was as follows: 


TUESDAY, SEPTEMBER 1, 195% 


Some Extreme Value Problems. 9:00 A.M.-10:50 A.M. 


Chairman: Allan Birnbaum, Columbia University. 

The Problem of Maximum Coincident Values and Application of Extreme Value Theory 
Emil H. Jebe, Iowa State College. (Read by H. O. Hartley, Iowa State College and 
University College, London.) 

Estimation of Extremal Parameters by Use of Order Statistics. Julius Lieblein, Statistical 
Engineering Laboratory, National Bureau of Standards. 

Discussion: Bradford F. Kimball, Public Service Commission of the State of New York 


‘Robust”’ Tests. 11:00 A.M .—12:50 P.M. 


Chairman: R. L. Anderson, University of North Carolina. 

Tests on Variances and Their Sensitivity to Nonnormality. G. E. P. Box, University of 
North Carolina. (Introduced by John W. Tukey.) 

An Investigation of a ‘“‘Robust’? Test on Variances. Sigurd L. Andersen, University of 
North Carolina. 

Discussion: R. A. Bradley, Virginia Polytechnic Institute, and John W. Tukey, Prince- 
ton University. 


Linear Programming. 2:00 P.M. 3:50 P.M. 


Co-Sponsor: Econometric Society. 

Chairman: T. C. Koopmans, Cowles Cor.:rission for Research in Economics 

Constrained Games and Linear Programming. A. Charnes and W. W. Cooper, Carnegie 
Institute of Technology. 

The Selection of Farm Enterprises <A case study in Linear Programming. R. J. Freund and 
R. A. King, University of North Carolina. 

Transformed Problems. H. D. Mills, Princeton University. 

Elementary Proof of the Min-Max Theorem of Games. George Dantzig, The Rand Corp 

Discussion: A. W. Tucker, Princeton University. 


CONTRIBUTED Papers I. 4:30 P.M.-6:20 P.M. 


Chairman: Milton Sobel, Cornell University. 
Papers: 
(1) Sequential Probability Ratio Confidence Sets (Preliminary Report). Allan Birnbaum, 
Columbia University 
) Optimum Sample Size for Choosing the Population having the Smaller Variance 
Paul N. Somerville, University of North Carolina and Virginia Polytechnic In 
stitute. 
The Generation of Pseudo-Random Numbers on a Decimal Calculator. Jack Moshman, 
Oak Ridge National Laboratory. 
) On the Integral Solution of Pearson’s Random Walk Problem and Related Matters 
David Durand, National Sureau of Economic Research, and J. Arthur Greenwood, 
Manhattan Life Insurance Company. 
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(5) On Optimal Systems. David Blackwell, Howard University. 

(6) Maximum Likelihood Regression Equations. H. Leon Harter, Wright-Patterson Air 
Force Base. 
Spherical Distributions (Preliminary Report). G. E. P. Box, University of North 
Carolina. 
On the Monotonic Character of the Power of a Certain Test in Multivariate Analysis 
of Variance. 8. N. Roy, University of North Carolina. 
Some Large-Sample Results on Estimation and Power for a Method of Paired Compari- 
sons. (Preliminary Report.) Ralph Allan Bradley, Virginia Polytechnic Institute. 
Nonparametric Estimation of Survivorship. (By title). Paul Meier, Johns Hopkins 
University. 

(11) Comparison of Two Rank Order Tests for the Two-Sample Problem. (By title). Gott- 
fried E. Noether, Boston University. 

(12) The Poisson Distribution as a Limit of Dependent Binomial Distributions with Un- 
equal Probabilities. (By title.) John E. Walsh, U.S. Naval Ordnance Test Station, 
Inyokern. 

(13) An Estimate of the Number of States in a Discrete Markov Chain. (By title.) A. T. Reid, 
University of Chicago. 

(14) On a Test of the Rank of a Matrix of Means for k p-Variate Normal Populations. 
(By title.) 8S. N. Roy, University of North Carolina. 

(15) On the Monotonic Character of the Power of a Test of Independence in Multivariate 
Analysis. (By title.) S. N. Roy, University of North Carolina. 


Council Meeting. 8:00 P.M. 


WEDNESDAY, SEPTEMBER 2, 1953 


Invited Address. 10:00 A.M. 


Chairman: Robert E. Bechhofer, Cornell University. 
Experimental Sampling with Control Variables. H. O. Hartley, Iowa State College and 
University College, London. 


THURSDAY, SEPTEMBER 3, 1953 


Multiple Decision Procedures. 10:00 A.M.-11:50 A.M. 


Chairman: George L. Edgett, Queen’s University. 

Single-sample and Two-sample Procedures for Ranking Populations According to an Un- 
known Parameter. Milton Sobel, Cornell University. 

Sequential Procedures for Ranking Populations According to an Unknown Parameter. 
Robert E. Bechhofer, Cornell University. 

Optimum Sample Size for Choosing the Largest of k + 1 Parameters. Paul N. Somerville, 
University of North Carolina 

Discussion: R. C. Bose and 8. N. Roy, University of North Carolina. 


Recent Advances in Mathematical Statistics. 2:00 P.M.-3:50 P.M. 


Chairman: Carl F. Kossack, Purdue University. 

Nonparametric Methods, Quality Control, Biological Statistics, C. C. Craig, University 
of Michigan. 

Experimental Design, Survey Theory. R. C. Bose, University of North Carolina. 

Testing Hypotheses, Estimation Techniques, Distribution Theory. Henry Teicher, Purdue 
University. 
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Business Meeting. 4:00 P.M. 
CONTRIBUTED Papers II. 4:30 P.M.-6:20 P.M. 


Chairman: D. B. DeLury, Ontario Research Foundation. 
Papers: 

(1) The Asymptotic Variance of Estimates of the Mean Life of a Radioactive Source (Pre- 
liminary Report.) Richard F. Link, Princeton University. 
Testing the Equality of Means of Rectangular Populations. Robert V. Hogg, State 
University of Iowa. 
The Structure of the Sample Space for Group Organization Theory. Leo Katz and 
James H. Powell, Michigan State College. 
A Family of Cumulative Frequency Functions for J-shaped frequency Functions. C. 
W. Topp and F. C. Leone, Case Institute of Technology. 
Multilayer Significance Procedures. (Preliminary Report.) John W. Tukey, Princeton 
University. 
Estimation in Truncated Multivariate Normal Distributions. A. C. Cohen, Jr., Uni- 
versity of Georgia. 
The Extrema of Certain Functionals of Distribution Functions (Preliminary Report). 
Wassily Hoeffding, University of North Carolina. 
Probability Distributions Related to Random Transformations of a Finite Set (Pre- 
liminary Report). H. Rubin and R. Sitgreaves, Stanford University. 
Characterization of Tolerance Regions. (By title.) D. A. 8S. Fraser, University of 
Toronto. 
A Nonparametric Model for the Linear Hypothesis. (By title.) D. A. 8. Fraser, Uni- 
versity of Toronto. 
On the Analysis of Diurnal Fluctuations in Physiological States and Performance. 
(Preliminary Report.) (By title.) Christine Kris, Illinois Institute of Technology 
and University of Chicago. 


GeEoRGE L. EpGetr 
Assistant Secretary 


a me 


MINUTES OF THE BUSINESS MEETING, 
KINGSTON, ONTARIO, SEPTEMBER 3, 1953 


The business meeting of the Institute of Mathematical Statistics was called to 
order by President Morris H. Hansen at 4:05 P.M., in Miller Hall, Queen’s 
University, Kingston, Ontario. Approximately forty members were present. 

The President reported to the business meeting that the financial position of 
the Institute continues good with a continuing growth in membership. In fact, 
the present healthy financial position raises the question of the disposition of the 
excess of current income over current expenses. There was general agreement in 
the Council that it is desirable to acquire a moderate surplus, equivalent at least 
to two or three years annual income, in order to have stability in meeting crises 
of the sort that occurred a few years ago. Beyond this, our net income position 


poses for consideration the question of possible reduction in dues against in- 
creased Institute activities. 


One step along this line is reported below in the form of an amendment to the 
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Bylaws providing for reductions in the dues for students and for members residing 
outside the Continental U.S. and Canada. 

Another step is the authorization by the Council of two committees, one a com- 
mittee to advise on desirable surplus levels, investment, and dues structure, and 
the other a committee to consider new areas of activity and development. Such 
committees will be appointed shortly and should have the benefit of suggestions 
from the membership. 


There has been considerable discussion recently concerning the desirability of 
scheduling the annual meeting at a time other than during the week between 
Christmas and New Years. This discussion has led to authorization by the Coun- 


cil of a committee to explore this problem with the membership and with other 
societies with whom we meet. Members may wish to give some thought to this 
question and make their wishes known. 

An especially significant action was taken at Kingston by the Council, an 
action designed to facilitate carrying out the business of the Institute by insuring 
that meetings would be held under conditions that make possible the full partici- 
pation of all members. To this end the following resolution was adopted unani- 
mously by the Council. 


It is the policy of the Institute of Mathematical Statistics that all its meetings shall be 
held on a completely nonsegregated basis. In particular, prior to determining the place of a 
forthcoming meeting, the Secretary of the Institute of Mathematical Statistics shall ascer- 
tain that meeting halls, eating facilities and housing accommodations adequate for the 
expected attendance will be available on a nonsegregated basis and that all social events 
connected with the meetings shall be nonsegregated. 


Professor Ralph A. Bradley next reported on plans for future meetings of the 
Institute. 

President-Elect Edwin G. Olds announced the appointment of Professor 
Donald A. S. Fraser as Program Coordinator for 1954. 

The Secretary moved the adoption of the following amendment to the Con- 
stitution: 


That, in the Constitution, Article 6, the word “sixty’’ be changed to ‘‘ninety’’ and the 
word “thirty” to “forty-five.” 


This motion was seconded and passed. The amendment having been recom- 
mended by the Executive Committee and notice having been sent to each 
member by the Secretary at least thirty days before the the date of the meeting, 
Article 6 of the Constitution was thereby amended to read: 


The president shall appoint a Nominating Committee and shall announce their names at 
the annual meeting when he retires from office. This committee shall submit to the members, 
through the Secretary and at least ninety days before the closing of polls at the next suc 
ceeding annual meeting, one nomination for President-Elect and a slate containing at least 
twice as many names as there are vacancies on the Council. 

Additional nominations may be made for President-Elect or for the Council by a petition 
signed by twenty members. Such nominations shall appear on the ballot if they are in the 
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hands of the Secretary at least forty-five days before the closing of polls at the next suc- 
ceeding annual meeting. In any event, members may vote for names in addition to those 
nominated. 


The secretary moved the adoption of the following amendment tothe Bylaws: 


That, in the Bylaws, Article 1, the words ‘“‘after it has been audited by a member or 
members appointed by the President, to whom such member or members shall report’ be 
deleted and the sentence “The financial statement shall be audited within three months of 
the close of the fiscal year by a person or persons appointed by the President, to whom such 


person or persons shall report.” inserted. 


This motion was second and passed. The relevant paragraph of Article 1 of the 
Bylaws was thereby amended to read: 


The Treasurer shall send out calls for annual dues, pay all bills for expenditures au 
thorized by the Institute, Council or Executive Committee; keep a detailed account of all 
receipts and expenditures; prepare a financial statement at the end of each fiscal year and 
present an abstract of same at a business meeting of the Institute. The financial statement 
shall be audited within three months of the close of the fiscal year by a person or persons 
appointed by the President, to whom such person or persons shall report. 


The secretary moved the adoption of the following amendment to the Bylaws: 


That in Article 2 of the Bylaws (1) the word ‘‘seven”’ in the second sentence be changed 
to ‘‘six’’, (2) the word “seven” in Exception D be changed to “‘six’’, (3) the word ‘“‘seven’”’ 


’ 
in Exception FE be changed to “‘six. 


” 


This motion was seconded and passed. The amendment having been recom- 
mended by the Council, the opening section of Article 2 was thereby amended to 
read: 

Members shall pay ten dollars at the time of admission to membership and shall receive 
the full current volume of the official journal. Thereafter members shall pay ten dollars 


annual dues, of which six dollars shall be for a subscription to the official journal. There 
shall be the following exceptions 


kixception D was thereby amended to read: 


Any Member who resides outside the United States and Canada shall pay five dollars 
annual dues 


I:xception E was thereby amended to read: 


Any Member who is a bona fide student (as certified by a member of the faculty of his 
institution) may pay six doilars annual dues. Members may not take advantage of this rate 
for more than four years. 


During the presentation of the last amendment, the chair was turned over to 
President-Elect E. G. Olds. 
The meeting adjourned at 4:35 P.M. 





PUBLICATIONS RECEIVED 


PUBLICATIONS RECEIVED 


ALLEN, R.G. D. anv Envy, J. E., International Trade Statistics, John Wiley and Sons, Inc., 
New York, 1953, xii + 448 pp., $7.50. 

Bulletin of the Department of Anthropology, Vol. 1, No. 1, January, 1952, Government of 
India Press, Caleutta, 159 pp., 10 sh. 

Burr, L. W., Engineering Statistics and Quality Control, McGraw-Hill Book Company, 
Inc., New York, 1953, xi + 442 pp., $7.00. 

Hansen, M. H., W. N. Hurwitz ann W. G. Mapow, Sample Survey Methods and Theory, 
Vols. I and II, John Wiley and Sons, Inc., xii + 638 pp. and xiii + 332 pp. New York, 
1953, $7.00 

Revue de Statistique Appliquée, Vol. 1, No. 1, 1953, Centre de Formation des Ingenieurs et 
Cadres aux Applications Industrielles de la Statistique, 103 pp. 

tomia, H. G., 50-100 Binomial Tables, John Wiley and Sons, Inc., New York, 1953, xxvii 
+ 172 pp., $4.00 

TINTNER, GERHARD, Mathematics and Statistics for Economists, Rinehart and Co., Ine., New 
York, 1953, $6.50, xiv + 363 pp 

Waker, HELEN M. anp Lev, Josern, Statistical Inference, Henry Holt and Co., New 
York, 1953, xi + 510 pp., $6.25. 

Yokohama Mathematical Journal. Vol. 1, No. 1, May, 1953, Yokohama Municipal Univer- 
sity, Yokohama, 129 pp 


———— ge 


INSTITUTIONAL MEMBERS 


The following are Institutional Members of the Institute for the year 1953: 
INTERNATIONAL BUSINESS MACHINES CoRPORATION, New York 
PRINCETON UNiversity, DerpaRTMENT OF MATHEMATICS, SECTION OF MATHEMATICAL 
STATISTICS, Princeton, New Jersey 
URDUE University Liprartieés, Lafayette, Indiana 
AYTHEON MANUFACTURING Company, Newton, Massachusetts 


NIVERSITY OF ILLINOIS, Urbana, Illinois 


NIVERSITY OF NORTH CaRoLina, INstITUTE OF Statistics, Chapel Hill, North Carolina 
NIVERSITY OF WASHINGTON, LABORATORY OF STATISTICAL RESEARCH, Seattle, Washing- 
ton 


P 
R 
University oF CALIFORNIA, STatTisTicaL LaBoratory, Berkeley, California 
U 
U 
Ul 





TRABAJOS DE ESTADISTICA 


feview published by ‘‘Departamento de Estadistica”’ of the ‘‘Consejo Superior de 
Investigaciones Cientificas’’ Madrid. Spain. 


Vol. IV CONTENTS Cuad. I 
R. Forret pie ae Sater dere ..Procesos estocdsticos en cascada. 


S. Rios ..Algunas leyes de probabilidad y procesos estocdsticos que se reducen 
a un tipo general de Laplace-Stieltjes. 


J. Gu Pretige....... - Las funciones absolutas en la Estadistica. 
A. H. KoL_mMoGororr Sucesiones estacionarias en espacios de Hilbert. 
J. Tena ....+.....s0brevisidn por muestreo en la Universidad de Madrid. 
Notas. Crénicas. Bibliografia. Cuestiones. 


For everything in connection with works, exchanges and subscription write to Prof. 
Sixto Rios. Departamento de Estadistica del Consejo Superior de Investigaciones 
Cientificas, Serrano 123, Madrid, Spain. The Review is composed of three fascicles 
published quarterly (about 350 pages) and its price is 80 pts. for Spain and South- 
America and 3 American Dollars for all other countries. 


JOURNAL OF THE 
AMERICAN STATISTICAL ASSOCIATION - 
December, 1 


1108 16th St., N.W. Washington 6, D. C. VOL. 48 NO. 264 


Recent Advances in Finding Best Operating Conditions. .. ; R. L. ANDERSON 
The Problem of Autocorrelation in Regression Analysis R. L. ANDERSON 
Percentage Points of the Incomplete Beta Function. . ROBERT E. CLARK 
Statistical Problems of the Kinsey Report 
WILtiaAM G. COCHRAN, FREDERICK MOSTELLER, AMD JOHN W. TUKEY 
On A Probability Mechanism to Attain Economic Control of the Resultant Error of Response 
and the Bias of ‘Nonresponse ; ; W. Epwarps DEMING 
A Note on Regression When There Is Extraneous Information about One of the Coefficients 
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